Microservices tracing with
Spring Cloud and Zipkin
Marcin Grzejszczak
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
About me
Developer at Pivotal
Part of Spring Cloud Team
Working with OSS:
● Accurest - Consumer Driven Contracts verifier for Java
● JSON Assert - fluent JSON assertions
● Spock Subjects Collaborators Extension
● Gradle Test Profiler
● Up To Date Gradle Plugin
TWITTER: @MGrzejszczak
BLOG: http://TOOMUCHCODING.COM
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Agenda
What is distributed tracing?
How to correlate logs with Spring Cloud Sleuth?
How to visualize latency with Spring Cloud Sleuth and Zipkin?
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
An ordinary system...
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
UI calls backend
UI -> BACKEND
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Everything is awesome
CLICK 200
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Until it’s not
CLICK 500
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Time to debug
https://tonysbologna.files.wordpress.com/2015/09/mario-and-luigi.jpg?w=468&h=578&crop=1
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
It doesn’t look like this
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
More like this
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
On which server / instance
was the exception thrown?
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
SSH and grep for ERROR to find it?
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Distributed tracing - terminology
Span
Trace
Logs (annotations)
Tags (binary annotations)
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Distributed tracing - terminology
Span
Trace
Logs (annotations)
Tags (binary annotations)
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Span
The basic unit of work (e.g. sending RPC)
● Spans are started and stopped
● They keep track of their timing information
● Once you create a span, you must stop it at some point in the future
● Has a parent and can have multiple children
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Trace
A set of spans forming a tree-like structure.
● For example, if you are running a book store then
○ Trace could be retriving a list of available books
○ Assuming that to retrive the books you have to send 3 requests to 3 services
then you could have at least 3 spans (1 for each hop) forming 1 trace
SERVICE 1
REQUEST
No Trace Id
No Span Id
RESPONSE
SERVICE 2
SERVICE 3
Trace Id = X
Span Id = A
Trace Id = X
Span Id = A
Trace Id = X
Span Id = A
REQUEST
RESPONSE
Trace Id = X
Span Id = B
Client Send
Trace Id = X
Span Id = B
Client Received
Trace Id = X
Span Id = B
Server Received
Trace Id = X
Span Id = C
Trace Id = X
Span Id = B
Server Sent
REQUEST
RESPONSE
Trace Id = X
Span Id = D
Client Send
Trace Id = X
Span Id = D
Client Received
Trace Id = X
Span Id = D
Server Received
Trace Id = X
Span Id = E
Trace Id = X
Span Id = D
Server Sent
Trace Id = X
Span Id = E
SERVICE 4
REQUEST
RESPONSE
Trace Id = X
Span Id = F
Client Send
Trace Id = X
Span Id = F
Client Received
Trace Id = X
Span Id = F
Server Received
Trace Id = X
Span Id = G
Trace Id = X
Span Id = F
Server Sent
Trace Id = X
Span Id = G
Trace Id = X
Span Id = C
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Span Id = A
Parent Id = null
Span Id = B
Parent Id = A
Span Id = C
Parent Id = B
Span Id = D
Parent Id = C
Span Id = E
Parent Id = D
Span Id = F
Parent Id = C
Span Id = G
Parent Id = F
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Is it that simple?
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Is it that simple?
How do you pass tracing information (incl. Trace ID)
between:
● different libraries?
● thread pools?
● asynchronous communication?
● …?
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
What if you forget about a thread pool?
SERVICE 1
REQUEST
NO TRACE
RESPONSE
SERVICE 2
SERVICE 3
A
A
A
REQUEST
RESPONSE
A
A
A B
A
REQUEST
RESPONSE
B
B
C C
C C
SERVICE 4
REQUEST
RESPONSE
B
B
D D
D D
B
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Log correlation with Spring Cloud Sleuth
We take care of passing tracing information between threads / libraries / contexts for
● Hystrix
● RxJava
● Rest Template
● Feign
● Messaging with Spring Integration
● Zuul
● ...
If you don’t do anything unexpected there’s nothing you need to do to make
Sleuth work. Check the docs for more info.
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Now let’s aggregate the logs!
Instead of SSHing to the machines aggregate the logs!
● With Cloud Foundry’s (CF) Loggergator the logs from different instances are
streamed into a single place
● You can harvest your logs with Logstash Forwarder / FileBeat
● You can use ELK stack to stream and visualize the logs
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Spring Cloud Sleuth with Maven
<dependencyManagement>
<dependencies>
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-dependencies</artifactId>
<version>Brixton.SR1</version>
<type>pom</type>
<scope>import</scope>
</dependency>
</dependencies>
</dependencyManagement>
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-starter-sleuth</artifactId>
</dependency>
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Spring Cloud Sleuth with Gradle
dependencies {
compile "org.springframework.cloud:spring-cloud-starter-sleuth"
}
dependencyManagement {
imports {
mavenBom "org.springframework.cloud:spring-cloud-dependencies:Brixton.SR1"
}
}
SERVICE 1
/start
REQUEST
RESPONSE
SERVICE 2
SERVICE 3
REQUEST
RESPONSE
REQUEST
RESPONSE
SERVICE 4
REQUEST
RESPONSE
“Hello from service3”
“Hello from service4”
“Hello from service2, response from
service3 [Hello from service3] and from
service4 [Hello from service4]”
SERVICE 1
/readtimeout
REQUEST
BOOM!
SERVICE 2
REQUEST
BOOM!
REQUEST
BOOM!
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Log correlation with Spring Cloud Sleuth
DEMO
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Great! We’ve found the exception!
But meanwhile....
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
The system is slow...
CLICK 200
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
One of the services is slow?
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Which one?
How to measure that?
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
● Client Send (CS) - The client has made a request
● Server Received (SR) - The server side got the request and will start processing
● Server Send (SS) - Annotated upon completion of request processing
● Client Received (CR) - The client has successfully received the response from
the server side
Let’s log events!
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
CS 0 ms SR 100 ms
SS 300 msCR 450 ms
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
● The request started at T=0ms
● It took 450 ms for the client to receive a response
● Server side received the request at T=100 ms
● The request got processed on the server side in 200 ms
Conclusions
CS 0 ms SR 100 ms
SS 300 msCR 450 ms
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Why is there a delay between sending and receiving messages?!!11!one!?!1!
Conclusions
CS 0 ms SR 100 ms
SS 300 msCR 450 ms
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
https://blogs.oracle.com/jag/resource/Fallacies.html
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Distributed tracing - terminology
Span
Trace
Logs (annotations)
Tags (binary annotations)
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Logs
Represents an event in time associated with a span
● Every span has zero or more logs
● Each log is a timestamped event name
● Event should be the stable name of some notable moment in the lifetime of a
span
● For instance, a span representing a browser page load might add an event for
each of the Performance.timing moments (check https://developer.mozilla.
org/en-US/docs/Web/API/PerformanceTiming)
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Main logs
● Client Send (CS)
○ The client has made a request - the span was started
● Server Received (SR)
○ The server side got the request and will start processing it
○ SR timestamp - CS timestamp = NETWORK LATENCY
CS 0 ms SR 100 ms
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Main logs
● Server Send (SS)
○ Annotated upon completion of request processing
○ SS timestamp - SR timestamp = SERVER SIDE PROCESSING TIME
● Client Received (CR)
○ The client has successfully received the response from the server side
○ CR timestamp - CS timestamp = TIME NEEDED TO RECEIVE RESPONSE
○ CR timestamp - SS timestamp = NETWORK LATENCY
CS 0 ms SR 100 ms
SS 300 msCR 450 ms
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Key-value pair
● Every span may have zero or more key/value Tags
● They do not have timestamps and simply annotate the spans.
● Example of default tags in Sleuth
○ message/payload-size
○ http.method
○ commandKey for Hystrix
Tag
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
How to visualise latency in
a distributed system?
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
● Zipkin is a distributed tracing system
● It runs as a separate process (you can run it as a Spring Boot
application)
● It helps gather timing data needed to troubleshoot latency problems in
microservice architectures
● The front end is a "waterfall" style graph of service calls showing call durations
as horizontal bars
The answer is: Zipkin
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
How does Zipkin work?
SPANS SENT TO
COLLECTORS
SPANS SENT TO
COLLECTORS
STORE
IN DB
APP
APP
UI QUERIES
FOR TRACE
INFO VIA API
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Spring Cloud Sleuth and Zipkin integration
● We take care of passing tracing information between threads / libraries /
contexts
● Upon closing of a Span we will send it to Zipkin
○ either via HTTP (spring-cloud-sleuth-zipkin)
○ or via Spring Cloud Stream (spring-cloud-sleuth-stream)
● You can run Zipkin Spring Cloud Stream Collector as a Spring Boot app (spring-
cloud-sleuth-zipkin-stream)
○ you can add the dependency to Zipkin UI!
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Spring Cloud Sleuth Zipkin with Maven
<dependencyManagement>
<dependencies>
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-dependencies</artifactId>
<version>Brixton.SR1</version>
<type>pom</type>
<scope>import</scope>
</dependency>
</dependencies>
</dependencyManagement>
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-starter-zipkin</artifactId>
</dependency>
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Spring Cloud Sleuth Zipkin with Gradle
dependencies {
compile "org.springframework.cloud:spring-cloud-starter-zipkin"
}
dependencyManagement {
imports {
mavenBom "org.springframework.cloud:spring-cloud-dependencies:Brixton.SR1"
}
}
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
HOLD IT!
● If I have billion services that emit gazillion spans - won’t I kill Zipkin?
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Sampling to the rescue!
● By default Spring Cloud Sleuth sends only 10% of requests to Zipkin
● You can change that by changing the property
spring.sleuth.sampler.percentage (for 100% pass 1.0)
● Or register a custom org.springframework.cloud.sleuth.Sampler
implementation
SERVICE 1
/start
REQUEST
RESPONSE
SERVICE 2
/foo
SERVICE 3
/barREQUEST
RESPONSE
REQUEST
RESPONSE
SERVICE 4
/baz
REQUEST
RESPONSE
DEVOXX
SERVICE
/devoxx
REQUEST
RESPONSE
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
DEMO
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Traced call
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Traced call
TOTAL DURATION
END
START
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Traced call
CLIENT
SENT
CLIENT
RECEIVED
SERVICE 2CLIENT
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Traced call
SERVER
RECEIVED
SERVER
SENT
SERVICE 4SERVER
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Traced call
LATENCY
SERVER
RECEIVED
CLIENT
SENT
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Traced call
SERVER
RECEIVED
CLIENT
SENT
DIFF IS
LATENCY
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Zipkin for Brewery
● A test app for Spring Cloud end to end tests
● Source code:
https://github.com/spring-cloud-samples/brewery
● Around 10 applications involved
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Summary
● Log correlation allows you to match logs for a given trace
● Distributed tracing allows you to quickly see latency issues in your system
● Zipkin is a great tool to visualize the latency graph and system dependencies
● Spring Cloud Sleuth integrates with Zipkin and grants you log correlation
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
THANK YOU
● https://github.com/marcingrzejszczak/vagrant-elk-box/tree/presentation - code for this presentation (clone
and run getReadyForConference.sh - NOTE: you need Vagrant!)
● https://github.com/spring-cloud/spring-cloud-sleuth - Spring Cloud Sleuth repository
● http://cloud.spring.io/spring-cloud-sleuth/spring-cloud-sleuth.html - Sleuth’s documentation
● http://toomuchcoding.com/blog/2016/03/25/spring-cloud-sleuth-rc1-deployed/ - article about RC1 release
● https://github.com/openzipkin/zipkin-java - Repo with Spring Boot Zipkin server
● http://docssleuth-service1.cfapps.io/start - The service1 app from this presentation deployed to Pivotal Cloud
Foundry - point of entry to the app
● http://docssleuth-zipkin-server.cfapps.io/ - Zipkin deployed to Pivotal Cloud Foundry
● http://docsbrewing-zipkin-server.cfapps.io - Zipkin deployed to PCF for Brewery Sample app

Microservices Tracing with Spring Cloud and Zipkin (devoxx)

  • 1.
    Microservices tracing with SpringCloud and Zipkin Marcin Grzejszczak Marcin Grzejszczak @mgrzejszczak, 24 June 2016
  • 2.
    Marcin Grzejszczak @mgrzejszczak,24 June 2016 About me Developer at Pivotal Part of Spring Cloud Team Working with OSS: ● Accurest - Consumer Driven Contracts verifier for Java ● JSON Assert - fluent JSON assertions ● Spock Subjects Collaborators Extension ● Gradle Test Profiler ● Up To Date Gradle Plugin TWITTER: @MGrzejszczak BLOG: http://TOOMUCHCODING.COM
  • 3.
  • 4.
    Marcin Grzejszczak @mgrzejszczak,24 June 2016 Agenda What is distributed tracing? How to correlate logs with Spring Cloud Sleuth? How to visualize latency with Spring Cloud Sleuth and Zipkin?
  • 5.
    Marcin Grzejszczak @mgrzejszczak,24 June 2016 An ordinary system...
  • 6.
    Marcin Grzejszczak @mgrzejszczak,24 June 2016 UI calls backend UI -> BACKEND
  • 7.
    Marcin Grzejszczak @mgrzejszczak,24 June 2016 Everything is awesome CLICK 200
  • 8.
    Marcin Grzejszczak @mgrzejszczak,24 June 2016 Until it’s not CLICK 500
  • 9.
  • 10.
    Marcin Grzejszczak @mgrzejszczak,24 June 2016 Time to debug https://tonysbologna.files.wordpress.com/2015/09/mario-and-luigi.jpg?w=468&h=578&crop=1
  • 11.
    Marcin Grzejszczak @mgrzejszczak,24 June 2016 It doesn’t look like this
  • 12.
    Marcin Grzejszczak @mgrzejszczak,24 June 2016 More like this
  • 13.
    Marcin Grzejszczak @mgrzejszczak,24 June 2016 On which server / instance was the exception thrown?
  • 14.
    Marcin Grzejszczak @mgrzejszczak,24 June 2016 SSH and grep for ERROR to find it?
  • 15.
    Marcin Grzejszczak @mgrzejszczak,24 June 2016 Distributed tracing - terminology Span Trace Logs (annotations) Tags (binary annotations)
  • 16.
    Marcin Grzejszczak @mgrzejszczak,24 June 2016 Distributed tracing - terminology Span Trace Logs (annotations) Tags (binary annotations)
  • 17.
    Marcin Grzejszczak @mgrzejszczak,24 June 2016 Span The basic unit of work (e.g. sending RPC) ● Spans are started and stopped ● They keep track of their timing information ● Once you create a span, you must stop it at some point in the future ● Has a parent and can have multiple children
  • 18.
    Marcin Grzejszczak @mgrzejszczak,24 June 2016 Trace A set of spans forming a tree-like structure. ● For example, if you are running a book store then ○ Trace could be retriving a list of available books ○ Assuming that to retrive the books you have to send 3 requests to 3 services then you could have at least 3 spans (1 for each hop) forming 1 trace
  • 19.
    SERVICE 1 REQUEST No TraceId No Span Id RESPONSE SERVICE 2 SERVICE 3 Trace Id = X Span Id = A Trace Id = X Span Id = A Trace Id = X Span Id = A REQUEST RESPONSE Trace Id = X Span Id = B Client Send Trace Id = X Span Id = B Client Received Trace Id = X Span Id = B Server Received Trace Id = X Span Id = C Trace Id = X Span Id = B Server Sent REQUEST RESPONSE Trace Id = X Span Id = D Client Send Trace Id = X Span Id = D Client Received Trace Id = X Span Id = D Server Received Trace Id = X Span Id = E Trace Id = X Span Id = D Server Sent Trace Id = X Span Id = E SERVICE 4 REQUEST RESPONSE Trace Id = X Span Id = F Client Send Trace Id = X Span Id = F Client Received Trace Id = X Span Id = F Server Received Trace Id = X Span Id = G Trace Id = X Span Id = F Server Sent Trace Id = X Span Id = G Trace Id = X Span Id = C
  • 20.
    Marcin Grzejszczak @mgrzejszczak,24 June 2016 Span Id = A Parent Id = null Span Id = B Parent Id = A Span Id = C Parent Id = B Span Id = D Parent Id = C Span Id = E Parent Id = D Span Id = F Parent Id = C Span Id = G Parent Id = F
  • 21.
    Marcin Grzejszczak @mgrzejszczak,24 June 2016 Is it that simple?
  • 22.
    Marcin Grzejszczak @mgrzejszczak,24 June 2016 Is it that simple? How do you pass tracing information (incl. Trace ID) between: ● different libraries? ● thread pools? ● asynchronous communication? ● …?
  • 23.
    Marcin Grzejszczak @mgrzejszczak,24 June 2016 What if you forget about a thread pool? SERVICE 1 REQUEST NO TRACE RESPONSE SERVICE 2 SERVICE 3 A A A REQUEST RESPONSE A A A B A REQUEST RESPONSE B B C C C C SERVICE 4 REQUEST RESPONSE B B D D D D B
  • 24.
    Marcin Grzejszczak @mgrzejszczak,24 June 2016 Log correlation with Spring Cloud Sleuth We take care of passing tracing information between threads / libraries / contexts for ● Hystrix ● RxJava ● Rest Template ● Feign ● Messaging with Spring Integration ● Zuul ● ... If you don’t do anything unexpected there’s nothing you need to do to make Sleuth work. Check the docs for more info.
  • 25.
    Marcin Grzejszczak @mgrzejszczak,24 June 2016 Now let’s aggregate the logs! Instead of SSHing to the machines aggregate the logs! ● With Cloud Foundry’s (CF) Loggergator the logs from different instances are streamed into a single place ● You can harvest your logs with Logstash Forwarder / FileBeat ● You can use ELK stack to stream and visualize the logs
  • 26.
    Marcin Grzejszczak @mgrzejszczak,24 June 2016 Spring Cloud Sleuth with Maven <dependencyManagement> <dependencies> <dependency> <groupId>org.springframework.cloud</groupId> <artifactId>spring-cloud-dependencies</artifactId> <version>Brixton.SR1</version> <type>pom</type> <scope>import</scope> </dependency> </dependencies> </dependencyManagement> <dependency> <groupId>org.springframework.cloud</groupId> <artifactId>spring-cloud-starter-sleuth</artifactId> </dependency>
  • 27.
    Marcin Grzejszczak @mgrzejszczak,24 June 2016 Spring Cloud Sleuth with Gradle dependencies { compile "org.springframework.cloud:spring-cloud-starter-sleuth" } dependencyManagement { imports { mavenBom "org.springframework.cloud:spring-cloud-dependencies:Brixton.SR1" } }
  • 28.
    SERVICE 1 /start REQUEST RESPONSE SERVICE 2 SERVICE3 REQUEST RESPONSE REQUEST RESPONSE SERVICE 4 REQUEST RESPONSE “Hello from service3” “Hello from service4” “Hello from service2, response from service3 [Hello from service3] and from service4 [Hello from service4]”
  • 29.
  • 30.
    Marcin Grzejszczak @mgrzejszczak,24 June 2016 Log correlation with Spring Cloud Sleuth DEMO
  • 31.
  • 32.
  • 33.
  • 34.
    Marcin Grzejszczak @mgrzejszczak,24 June 2016 Great! We’ve found the exception! But meanwhile....
  • 35.
    Marcin Grzejszczak @mgrzejszczak,24 June 2016 The system is slow... CLICK 200
  • 36.
    Marcin Grzejszczak @mgrzejszczak,24 June 2016 One of the services is slow?
  • 37.
    Marcin Grzejszczak @mgrzejszczak,24 June 2016 Which one? How to measure that?
  • 38.
    Marcin Grzejszczak @mgrzejszczak,24 June 2016 ● Client Send (CS) - The client has made a request ● Server Received (SR) - The server side got the request and will start processing ● Server Send (SS) - Annotated upon completion of request processing ● Client Received (CR) - The client has successfully received the response from the server side Let’s log events!
  • 39.
    Marcin Grzejszczak @mgrzejszczak,24 June 2016 CS 0 ms SR 100 ms SS 300 msCR 450 ms
  • 40.
    Marcin Grzejszczak @mgrzejszczak,24 June 2016 ● The request started at T=0ms ● It took 450 ms for the client to receive a response ● Server side received the request at T=100 ms ● The request got processed on the server side in 200 ms Conclusions CS 0 ms SR 100 ms SS 300 msCR 450 ms
  • 41.
    Marcin Grzejszczak @mgrzejszczak,24 June 2016 Why is there a delay between sending and receiving messages?!!11!one!?!1! Conclusions CS 0 ms SR 100 ms SS 300 msCR 450 ms
  • 42.
    Marcin Grzejszczak @mgrzejszczak,24 June 2016 https://blogs.oracle.com/jag/resource/Fallacies.html
  • 43.
    Marcin Grzejszczak @mgrzejszczak,24 June 2016 Distributed tracing - terminology Span Trace Logs (annotations) Tags (binary annotations)
  • 44.
    Marcin Grzejszczak @mgrzejszczak,24 June 2016 Logs Represents an event in time associated with a span ● Every span has zero or more logs ● Each log is a timestamped event name ● Event should be the stable name of some notable moment in the lifetime of a span ● For instance, a span representing a browser page load might add an event for each of the Performance.timing moments (check https://developer.mozilla. org/en-US/docs/Web/API/PerformanceTiming)
  • 45.
  • 46.
    Marcin Grzejszczak @mgrzejszczak,24 June 2016 Main logs ● Client Send (CS) ○ The client has made a request - the span was started ● Server Received (SR) ○ The server side got the request and will start processing it ○ SR timestamp - CS timestamp = NETWORK LATENCY CS 0 ms SR 100 ms
  • 47.
    Marcin Grzejszczak @mgrzejszczak,24 June 2016 Main logs ● Server Send (SS) ○ Annotated upon completion of request processing ○ SS timestamp - SR timestamp = SERVER SIDE PROCESSING TIME ● Client Received (CR) ○ The client has successfully received the response from the server side ○ CR timestamp - CS timestamp = TIME NEEDED TO RECEIVE RESPONSE ○ CR timestamp - SS timestamp = NETWORK LATENCY CS 0 ms SR 100 ms SS 300 msCR 450 ms
  • 48.
    Marcin Grzejszczak @mgrzejszczak,24 June 2016 Key-value pair ● Every span may have zero or more key/value Tags ● They do not have timestamps and simply annotate the spans. ● Example of default tags in Sleuth ○ message/payload-size ○ http.method ○ commandKey for Hystrix Tag
  • 49.
    Marcin Grzejszczak @mgrzejszczak,24 June 2016 How to visualise latency in a distributed system?
  • 50.
    Marcin Grzejszczak @mgrzejszczak,24 June 2016 ● Zipkin is a distributed tracing system ● It runs as a separate process (you can run it as a Spring Boot application) ● It helps gather timing data needed to troubleshoot latency problems in microservice architectures ● The front end is a "waterfall" style graph of service calls showing call durations as horizontal bars The answer is: Zipkin
  • 51.
    Marcin Grzejszczak @mgrzejszczak,24 June 2016 How does Zipkin work? SPANS SENT TO COLLECTORS SPANS SENT TO COLLECTORS STORE IN DB APP APP UI QUERIES FOR TRACE INFO VIA API
  • 52.
    Marcin Grzejszczak @mgrzejszczak,24 June 2016 Spring Cloud Sleuth and Zipkin integration ● We take care of passing tracing information between threads / libraries / contexts ● Upon closing of a Span we will send it to Zipkin ○ either via HTTP (spring-cloud-sleuth-zipkin) ○ or via Spring Cloud Stream (spring-cloud-sleuth-stream) ● You can run Zipkin Spring Cloud Stream Collector as a Spring Boot app (spring- cloud-sleuth-zipkin-stream) ○ you can add the dependency to Zipkin UI!
  • 53.
    Marcin Grzejszczak @mgrzejszczak,24 June 2016 Spring Cloud Sleuth Zipkin with Maven <dependencyManagement> <dependencies> <dependency> <groupId>org.springframework.cloud</groupId> <artifactId>spring-cloud-dependencies</artifactId> <version>Brixton.SR1</version> <type>pom</type> <scope>import</scope> </dependency> </dependencies> </dependencyManagement> <dependency> <groupId>org.springframework.cloud</groupId> <artifactId>spring-cloud-starter-zipkin</artifactId> </dependency>
  • 54.
    Marcin Grzejszczak @mgrzejszczak,24 June 2016 Spring Cloud Sleuth Zipkin with Gradle dependencies { compile "org.springframework.cloud:spring-cloud-starter-zipkin" } dependencyManagement { imports { mavenBom "org.springframework.cloud:spring-cloud-dependencies:Brixton.SR1" } }
  • 55.
    Marcin Grzejszczak @mgrzejszczak,24 June 2016 HOLD IT! ● If I have billion services that emit gazillion spans - won’t I kill Zipkin?
  • 56.
    Marcin Grzejszczak @mgrzejszczak,24 June 2016 Sampling to the rescue! ● By default Spring Cloud Sleuth sends only 10% of requests to Zipkin ● You can change that by changing the property spring.sleuth.sampler.percentage (for 100% pass 1.0) ● Or register a custom org.springframework.cloud.sleuth.Sampler implementation
  • 57.
    SERVICE 1 /start REQUEST RESPONSE SERVICE 2 /foo SERVICE3 /barREQUEST RESPONSE REQUEST RESPONSE SERVICE 4 /baz REQUEST RESPONSE DEVOXX SERVICE /devoxx REQUEST RESPONSE
  • 58.
  • 59.
    Marcin Grzejszczak @mgrzejszczak,24 June 2016 Traced call
  • 60.
    Marcin Grzejszczak @mgrzejszczak,24 June 2016 Traced call TOTAL DURATION END START
  • 61.
    Marcin Grzejszczak @mgrzejszczak,24 June 2016 Traced call CLIENT SENT CLIENT RECEIVED SERVICE 2CLIENT
  • 62.
    Marcin Grzejszczak @mgrzejszczak,24 June 2016 Traced call SERVER RECEIVED SERVER SENT SERVICE 4SERVER
  • 63.
    Marcin Grzejszczak @mgrzejszczak,24 June 2016 Traced call LATENCY SERVER RECEIVED CLIENT SENT
  • 64.
    Marcin Grzejszczak @mgrzejszczak,24 June 2016 Traced call SERVER RECEIVED CLIENT SENT DIFF IS LATENCY
  • 65.
    Marcin Grzejszczak @mgrzejszczak,24 June 2016 Zipkin for Brewery ● A test app for Spring Cloud end to end tests ● Source code: https://github.com/spring-cloud-samples/brewery ● Around 10 applications involved
  • 66.
  • 67.
  • 68.
    Marcin Grzejszczak @mgrzejszczak,24 June 2016 Summary ● Log correlation allows you to match logs for a given trace ● Distributed tracing allows you to quickly see latency issues in your system ● Zipkin is a great tool to visualize the latency graph and system dependencies ● Spring Cloud Sleuth integrates with Zipkin and grants you log correlation
  • 69.
  • 70.
    Marcin Grzejszczak @mgrzejszczak,24 June 2016 THANK YOU ● https://github.com/marcingrzejszczak/vagrant-elk-box/tree/presentation - code for this presentation (clone and run getReadyForConference.sh - NOTE: you need Vagrant!) ● https://github.com/spring-cloud/spring-cloud-sleuth - Spring Cloud Sleuth repository ● http://cloud.spring.io/spring-cloud-sleuth/spring-cloud-sleuth.html - Sleuth’s documentation ● http://toomuchcoding.com/blog/2016/03/25/spring-cloud-sleuth-rc1-deployed/ - article about RC1 release ● https://github.com/openzipkin/zipkin-java - Repo with Spring Boot Zipkin server ● http://docssleuth-service1.cfapps.io/start - The service1 app from this presentation deployed to Pivotal Cloud Foundry - point of entry to the app ● http://docssleuth-zipkin-server.cfapps.io/ - Zipkin deployed to Pivotal Cloud Foundry ● http://docsbrewing-zipkin-server.cfapps.io - Zipkin deployed to PCF for Brewery Sample app