Microservices tracing with
Spring Cloud and Zipkin
Marcin Grzejszczak
Marcin Grzejszczak @mgrzejszczak, 21 May 2016
Marcin Grzejszczak @mgrzejszczak, 21 May 2016
About me
Developer at Pivotal
Part of Spring Cloud Team
Working with OSS:
● Accurest - Consumer Driven Contracts verifier for Java
● JSON Assert - fluent JSON assertions
● Spock Subjects Collaborators Extension
● Gradle Test Profiler
● Up To Date Gradle Plugin
TWITTER: @MGrzejszczak
BLOG: http://TOOMUCHCODING.COM
Marcin Grzejszczak @mgrzejszczak, 21 May 2016
Marcin Grzejszczak @mgrzejszczak, 21 May 2016
Agenda
What is distributed tracing?
How to correlate logs with Spring Cloud Sleuth?
How to visualize latency with Spring Cloud Sleuth and Zipkin?
Marcin Grzejszczak @mgrzejszczak, 21 May 2016
An ordinary system...
Marcin Grzejszczak @mgrzejszczak, 21 May 2016
UI calls backend
UI -> BACKEND
Marcin Grzejszczak @mgrzejszczak, 21 May 2016
Everything is awesome
CLICK 200
Marcin Grzejszczak @mgrzejszczak, 21 May 2016
Until it’s not
CLICK 500
Marcin Grzejszczak @mgrzejszczak, 21 May 2016
Marcin Grzejszczak @mgrzejszczak, 21 May 2016
Time to debug
https://tonysbologna.files.wordpress.com/2015/09/mario-and-luigi.jpg?w=468&h=578&crop=1
Marcin Grzejszczak @mgrzejszczak, 21 May 2016
It doesn’t look like this
Marcin Grzejszczak @mgrzejszczak, 21 May 2016
More like this
Marcin Grzejszczak @mgrzejszczak, 21 May 2016
On which server / instance
was the exception thrown?
Marcin Grzejszczak @mgrzejszczak, 21 May 2016
SSH and grep for ERROR to find it?
Marcin Grzejszczak @mgrzejszczak, 21 May 2016
Distributed tracing - terminology
Span
Trace
Logs (annotations)
Tags (binary annotations)
Marcin Grzejszczak @mgrzejszczak, 21 May 2016
Distributed tracing - terminology
Span
Trace
Logs (annotations)
Tags (binary annotations)
Marcin Grzejszczak @mgrzejszczak, 21 May 2016
Span
The basic unit of work (e.g. sending RPC)
● Spans are started and stopped
● They keep track of their timing information
● Once you create a span, you must stop it at some point in the future
● Has a parent and can have multiple children
Marcin Grzejszczak @mgrzejszczak, 21 May 2016
Trace
A set of spans forming a tree-like structure.
● For example, if you are running a book store then
○ Trace could be retriving a list of available books
○ Assuming that to retrive the books you have to send 3 requests to 3 services
then you could have at least 3 spans (1 for each hop) forming 1 trace
SERVICE 1
REQUEST
No Trace Id
No Span Id
RESPONSE
SERVICE 2
SERVICE 3
Trace Id = X
Span Id = A
Trace Id = X
Span Id = A
Trace Id = X
Span Id = A
REQUEST
RESPONSE
Trace Id = X
Span Id = B
Client Sent
Trace Id = X
Span Id = B
Client Received
Trace Id = X
Span Id = B
Server Received
Trace Id = X
Span Id = C
Trace Id = X
Span Id = B
Server Sent
REQUEST
RESPONSE
Trace Id = X
Span Id = D
Client Sent
Trace Id = X
Span Id = D
Client Received
Trace Id = X
Span Id = D
Server Received
Trace Id = X
Span Id = E
Trace Id = X
Span Id = D
Server Sent
Trace Id = X
Span Id = E
SERVICE 4
REQUEST
RESPONSE
Trace Id = X
Span Id = F
Client Sent
Trace Id = X
Span Id = F
Client Received
Trace Id = X
Span Id = F
Server Received
Trace Id = X
Span Id = G
Trace Id = X
Span Id = F
Server Sent
Trace Id = X
Span Id = G
Trace Id = X
Span Id = C
Marcin Grzejszczak @mgrzejszczak, 21 May 2016
Span Id = A
Parent Id = null
Span Id = B
Parent Id = A
Span Id = C
Parent Id = B
Span Id = D
Parent Id = C
Span Id = E
Parent Id = D
Span Id = F
Parent Id = C
Span Id = G
Parent Id = F
Marcin Grzejszczak @mgrzejszczak, 21 May 2016
Is it that simple?
Marcin Grzejszczak @mgrzejszczak, 21 May 2016
Is it that simple?
How do you pass tracing information (incl. Trace ID)
between:
● different libraries?
● thread pools?
● asynchronous communication?
● …?
Marcin Grzejszczak @mgrzejszczak, 21 May 2016
Would you want to do that yourself?
Marcin Grzejszczak @mgrzejszczak, 21 May 2016
Log correlation with Spring Cloud Sleuth
We take care of passing tracing information between threads / libraries / contexts for
● Hystrix
● RxJava
● Rest Template
● Feign
● Messaging with Spring Integration
● Zuul
● ...
If you don’t do anything unexpected there’s nothing you need to do to make
Sleuth work. Check the docs for more info.
Marcin Grzejszczak @mgrzejszczak, 21 May 2016
Now let’s aggregate the logs!
Instead of SSHing to the machines aggregate the logs!
● With Cloud Foundry’s (CF) Loggergator the logs from different instances are
streamed into a single place
● You can harvest your logs with Logstash Forwarder / FileBeat
● You can use ELK stack to stream and visualize the logs
Marcin Grzejszczak @mgrzejszczak, 21 May 2016
Spring Cloud Sleuth with Maven
<dependencyManagement>
<dependencies>
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-dependencies</artifactId>
<version>Brixton.RELEASE</version>
<type>pom</type>
<scope>import</scope>
</dependency>
</dependencies>
</dependencyManagement>
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-starter-sleuth</artifactId>
</dependency>
Marcin Grzejszczak @mgrzejszczak, 21 May 2016
Spring Cloud Sleuth with Gradle
dependencies {
compile "org.springframework.cloud:spring-cloud-starter-sleuth"
}
dependencyManagement {
imports {
mavenBom "org.springframework.cloud:spring-cloud-dependencies:Brixton.RELEASE"
}
}
SERVICE 1
REQUEST
RESPONSE
SERVICE 2
SERVICE 3
REQUEST
RESPONSE
REQUEST
RESPONSE
SERVICE 4
REQUEST
RESPONSE
“Hello from service3”
“Hello from service4”
“Hello from service2, response from
service3 [Hello from service3] and from
service4 [Hello from service4]”
Marcin Grzejszczak @mgrzejszczak, 21 May 2016
Log correlation with Spring Cloud Sleuth
DEMO
Marcin Grzejszczak @mgrzejszczak, 21 May 2016
Marcin Grzejszczak @mgrzejszczak, 21 May 2016
Marcin Grzejszczak @mgrzejszczak, 21 May 2016
Marcin Grzejszczak @mgrzejszczak, 21 May 2016
Great! We’ve found the exception!
But meanwhile....
Marcin Grzejszczak @mgrzejszczak, 21 May 2016
The system is slow...
CLICK 200
Marcin Grzejszczak @mgrzejszczak, 21 May 2016
One of the services is slow?
Marcin Grzejszczak @mgrzejszczak, 21 May 2016
Which one?
How to measure that?
Marcin Grzejszczak @mgrzejszczak, 21 May 2016
● Client Sent (CS) - The client has made a request
● Server Received (SR) - The server side got the request and will start processing
● Server Send (SS) - Annotated upon completion of request processing
● Client Received (CR) - The client has successfully received the response from
the server side
Let’s log events!
Marcin Grzejszczak @mgrzejszczak, 21 May 2016
CS 0 ms SR 100 ms
SS 200 msCR 300 ms
Marcin Grzejszczak @mgrzejszczak, 21 May 2016
● The request started at T=0ms
● It took 300 ms for the client to receive a response
● Server side received the request at T=100 ms
● The request got processed on the server side in 100 ms
Conclusions
CS 0 ms SR 100 ms
SS 200 msCR 300 ms
Marcin Grzejszczak @mgrzejszczak, 21 May 2016
Why is there a delay between sending and receiving messages?!!11!one!?!1!
Conclusions
CS 0 ms SR 100 ms
SS 200 msCR 300 ms
Marcin Grzejszczak @mgrzejszczak, 21 May 2016
https://blogs.oracle.com/jag/resource/Fallacies.html
Marcin Grzejszczak @mgrzejszczak, 21 May 2016
Distributed tracing - terminology
Span
Trace
Logs (annotations)
Tags (binary annotations)
Marcin Grzejszczak @mgrzejszczak, 21 May 2016
Logs
Represents an event in time associated with a span
● Every span has zero or more logs
● Each log is a timestamped event name
● Event should be the stable name of some notable moment in the lifetime of a
span
● For instance, a span representing a browser page load might add an event for
each of the Performance.timing moments (check https://developer.mozilla.
org/en-US/docs/Web/API/PerformanceTiming)
Marcin Grzejszczak @mgrzejszczak, 21 May 2016
Main logs
● Client Send (CS)
○ The client has made a request - the span was started
● Server Received (SR)
○ The server side got the request and will start processing it
○ SR timestamp - CS timestamp = NETWORK LATENCY
CS 0 ms SR 100 ms
Marcin Grzejszczak @mgrzejszczak, 21 May 2016
Main logs
● Server Send (SS)
○ Annotated upon completion of request processing
○ SS timestamp - SR timestamp = SERVER SIDE PROCESSING TIME
● Client Received (CR)
○ The client has successfully received the response from the server side
○ CR timestamp - CS timestamp = TIME NEEDED TO RECEIVE RESPONSE
○ CR timestamp - SS timestamp = NETWORK LATENCY
CS 0 ms SR 100 ms
SS 200 msCR 300 ms
Marcin Grzejszczak @mgrzejszczak, 21 May 2016
Key-value pair
● Every span may have zero or more key/value Tags
● They do not have timestamps and simply annotate the spans.
● Example of default tags in Sleuth
○ message/payload-size
○ http.method
○ commandKey for Hystrix
Tag
Marcin Grzejszczak @mgrzejszczak, 21 May 2016
How to visualise latency in
a distributed system?
Marcin Grzejszczak @mgrzejszczak, 21 May 2016
● Zipkin is a distributed tracing system
● It runs as a separate process (you can run it as a Spring Boot
application)
● It helps gather timing data needed to troubleshoot latency problems in
microservice architectures
● The front end is a "waterfall" style graph of service calls showing call durations
as horizontal bars
The answer is: Zipkin
Marcin Grzejszczak @mgrzejszczak, 21 May 2016
How does Zipkin work?
SPANS SENT TO
COLLECTORS
SPANS SENT TO
COLLECTORS
STORE
IN DB
APP
APP
UI QUERIES
FOR TRACE
INFO VIA API
Marcin Grzejszczak @mgrzejszczak, 21 May 2016
Spring Cloud Sleuth and Zipkin integration
● We take care of passing tracing information between threads / libraries /
contexts
● Upon closing of a Span we will send it to Zipkin
○ either via HTTP (spring-cloud-sleuth-zipkin)
○ or via Spring Cloud Stream (spring-cloud-sleuth-stream)
● You can run Zipkin Sping Cloud Stream Collector as a Spring Boot app (spring-
cloud-sleuth-zipkin-stream)
○ you can add the dependency to Zipkin UI!
Marcin Grzejszczak @mgrzejszczak, 21 May 2016
Spring Cloud Sleuth Zipkin with Maven
<dependencyManagement>
<dependencies>
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-dependencies</artifactId>
<version>Brixton.RELEASE</version>
<type>pom</type>
<scope>import</scope>
</dependency>
</dependencies>
</dependencyManagement>
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-starter-zipkin</artifactId>
</dependency>
Marcin Grzejszczak @mgrzejszczak, 21 May 2016
Spring Cloud Sleuth Zipkin with Gradle
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
dependencies {
compile "org.springframework.cloud:spring-cloud-starter-zipkin"
}
dependencyManagement {
imports {
mavenBom "org.springframework.cloud:spring-cloud-dependencies:Brixton.RELEASE"
}
}
Marcin Grzejszczak @mgrzejszczak, 21 May 2016
HOLD IT!
● If I have billion services that emit gazillion spans - won’t I kill Zipkin?
Marcin Grzejszczak @mgrzejszczak, 21 May 2016
Sampling to the rescue!
● By default Spring Cloud Sleuth sends only 10% of requests to Zipkin
● You can change that by changing the property
spring.sleuth.sampler.percentage (for 100% pass 1.0)
● Or register a custom org.springframework.cloud.sleuth.Sampler
implementation
SERVICE 1
/start
REQUEST
RESPONSE
SERVICE 2
/foo
SERVICE 3
/barREQUEST
RESPONSE
REQUEST
RESPONSE
SERVICE 4
/baz
REQUEST
RESPONSE
CYBERCOM
SERVICE
/cybercom
REQUEST
RESPONSE
Marcin Grzejszczak @mgrzejszczak, 21 May 2016
DEMO
Marcin Grzejszczak @mgrzejszczak, 21 May 2016
Traced call
Marcin Grzejszczak @mgrzejszczak, 21 May 2016
Traced call
TOTAL DURATION
END
START
Marcin Grzejszczak @mgrzejszczak, 21 May 2016
Traced call
CLIENT
SENT
CLIENT
RECEIVED
SERVICE 2CLIENT
Marcin Grzejszczak @mgrzejszczak, 21 May 2016
Traced call
SERVER
RECEIVED
SERVER
SENT
SERVICE 4SERVER
Marcin Grzejszczak @mgrzejszczak, 21 May 2016
Zipkin for Brewery
● A test app for Spring Cloud end to end tests
● Source code:
https://github.com/spring-cloud-samples/brewery
● Around 10 applications involved
Marcin Grzejszczak @mgrzejszczak, 21 May 2016
Marcin Grzejszczak @mgrzejszczak, 21 May 2016
Marcin Grzejszczak @mgrzejszczak, 21 May 2016
Summary
● Log correlation allows you to match logs for a given trace
● Distributed tracing allows you to quickly see latency issues in your system
● Zipkin is a great tool to visualize the latency graph and system dependencies
● Spring Cloud Sleuth integrates with Zipkin and grants you log correlation
Marcin Grzejszczak @mgrzejszczak, 21 May 2016
Marcin Grzejszczak @mgrzejszczak, 21 May 2016
THANK YOU
● https://github.com/marcingrzejszczak/vagrant-elk-box/tree/presentation - code for this presentation (clone
and run getReadyForConference.sh - NOTE: you need Vagrant!)
● https://github.com/spring-cloud/spring-cloud-sleuth - Spring Cloud Sleuth repository
● http://cloud.spring.io/spring-cloud-sleuth/spring-cloud-sleuth.html - Sleuth’s documentation
● http://toomuchcoding.com/blog/2016/03/25/spring-cloud-sleuth-rc1-deployed/ - article about RC1 release
● https://github.com/openzipkin/zipkin-java - Repo with Spring Boot Zipkin server
● http://docssleuth-service1.cfapps.io/start - The service1 app from this presentation deployed to Pivotal Cloud
Foundry - point of entry to the app
● http://docssleuth-zipkin-server.cfapps.io/ - Zipkin deployed to Pivotal Cloud Foundry
● http://brewery-zipkin-web.cfapps.io - Zipkin deployed to PCF for Brewery Sample app

Microservices Tracing With Spring Cloud and Zipkin @CybercomDEV

  • 1.
    Microservices tracing with SpringCloud and Zipkin Marcin Grzejszczak Marcin Grzejszczak @mgrzejszczak, 21 May 2016
  • 2.
    Marcin Grzejszczak @mgrzejszczak,21 May 2016 About me Developer at Pivotal Part of Spring Cloud Team Working with OSS: ● Accurest - Consumer Driven Contracts verifier for Java ● JSON Assert - fluent JSON assertions ● Spock Subjects Collaborators Extension ● Gradle Test Profiler ● Up To Date Gradle Plugin TWITTER: @MGrzejszczak BLOG: http://TOOMUCHCODING.COM
  • 3.
  • 4.
    Marcin Grzejszczak @mgrzejszczak,21 May 2016 Agenda What is distributed tracing? How to correlate logs with Spring Cloud Sleuth? How to visualize latency with Spring Cloud Sleuth and Zipkin?
  • 5.
    Marcin Grzejszczak @mgrzejszczak,21 May 2016 An ordinary system...
  • 6.
    Marcin Grzejszczak @mgrzejszczak,21 May 2016 UI calls backend UI -> BACKEND
  • 7.
    Marcin Grzejszczak @mgrzejszczak,21 May 2016 Everything is awesome CLICK 200
  • 8.
    Marcin Grzejszczak @mgrzejszczak,21 May 2016 Until it’s not CLICK 500
  • 9.
  • 10.
    Marcin Grzejszczak @mgrzejszczak,21 May 2016 Time to debug https://tonysbologna.files.wordpress.com/2015/09/mario-and-luigi.jpg?w=468&h=578&crop=1
  • 11.
    Marcin Grzejszczak @mgrzejszczak,21 May 2016 It doesn’t look like this
  • 12.
    Marcin Grzejszczak @mgrzejszczak,21 May 2016 More like this
  • 13.
    Marcin Grzejszczak @mgrzejszczak,21 May 2016 On which server / instance was the exception thrown?
  • 14.
    Marcin Grzejszczak @mgrzejszczak,21 May 2016 SSH and grep for ERROR to find it?
  • 15.
    Marcin Grzejszczak @mgrzejszczak,21 May 2016 Distributed tracing - terminology Span Trace Logs (annotations) Tags (binary annotations)
  • 16.
    Marcin Grzejszczak @mgrzejszczak,21 May 2016 Distributed tracing - terminology Span Trace Logs (annotations) Tags (binary annotations)
  • 17.
    Marcin Grzejszczak @mgrzejszczak,21 May 2016 Span The basic unit of work (e.g. sending RPC) ● Spans are started and stopped ● They keep track of their timing information ● Once you create a span, you must stop it at some point in the future ● Has a parent and can have multiple children
  • 18.
    Marcin Grzejszczak @mgrzejszczak,21 May 2016 Trace A set of spans forming a tree-like structure. ● For example, if you are running a book store then ○ Trace could be retriving a list of available books ○ Assuming that to retrive the books you have to send 3 requests to 3 services then you could have at least 3 spans (1 for each hop) forming 1 trace
  • 19.
    SERVICE 1 REQUEST No TraceId No Span Id RESPONSE SERVICE 2 SERVICE 3 Trace Id = X Span Id = A Trace Id = X Span Id = A Trace Id = X Span Id = A REQUEST RESPONSE Trace Id = X Span Id = B Client Sent Trace Id = X Span Id = B Client Received Trace Id = X Span Id = B Server Received Trace Id = X Span Id = C Trace Id = X Span Id = B Server Sent REQUEST RESPONSE Trace Id = X Span Id = D Client Sent Trace Id = X Span Id = D Client Received Trace Id = X Span Id = D Server Received Trace Id = X Span Id = E Trace Id = X Span Id = D Server Sent Trace Id = X Span Id = E SERVICE 4 REQUEST RESPONSE Trace Id = X Span Id = F Client Sent Trace Id = X Span Id = F Client Received Trace Id = X Span Id = F Server Received Trace Id = X Span Id = G Trace Id = X Span Id = F Server Sent Trace Id = X Span Id = G Trace Id = X Span Id = C
  • 20.
    Marcin Grzejszczak @mgrzejszczak,21 May 2016 Span Id = A Parent Id = null Span Id = B Parent Id = A Span Id = C Parent Id = B Span Id = D Parent Id = C Span Id = E Parent Id = D Span Id = F Parent Id = C Span Id = G Parent Id = F
  • 21.
    Marcin Grzejszczak @mgrzejszczak,21 May 2016 Is it that simple?
  • 22.
    Marcin Grzejszczak @mgrzejszczak,21 May 2016 Is it that simple? How do you pass tracing information (incl. Trace ID) between: ● different libraries? ● thread pools? ● asynchronous communication? ● …?
  • 23.
    Marcin Grzejszczak @mgrzejszczak,21 May 2016 Would you want to do that yourself?
  • 24.
    Marcin Grzejszczak @mgrzejszczak,21 May 2016 Log correlation with Spring Cloud Sleuth We take care of passing tracing information between threads / libraries / contexts for ● Hystrix ● RxJava ● Rest Template ● Feign ● Messaging with Spring Integration ● Zuul ● ... If you don’t do anything unexpected there’s nothing you need to do to make Sleuth work. Check the docs for more info.
  • 25.
    Marcin Grzejszczak @mgrzejszczak,21 May 2016 Now let’s aggregate the logs! Instead of SSHing to the machines aggregate the logs! ● With Cloud Foundry’s (CF) Loggergator the logs from different instances are streamed into a single place ● You can harvest your logs with Logstash Forwarder / FileBeat ● You can use ELK stack to stream and visualize the logs
  • 26.
    Marcin Grzejszczak @mgrzejszczak,21 May 2016 Spring Cloud Sleuth with Maven <dependencyManagement> <dependencies> <dependency> <groupId>org.springframework.cloud</groupId> <artifactId>spring-cloud-dependencies</artifactId> <version>Brixton.RELEASE</version> <type>pom</type> <scope>import</scope> </dependency> </dependencies> </dependencyManagement> <dependency> <groupId>org.springframework.cloud</groupId> <artifactId>spring-cloud-starter-sleuth</artifactId> </dependency>
  • 27.
    Marcin Grzejszczak @mgrzejszczak,21 May 2016 Spring Cloud Sleuth with Gradle dependencies { compile "org.springframework.cloud:spring-cloud-starter-sleuth" } dependencyManagement { imports { mavenBom "org.springframework.cloud:spring-cloud-dependencies:Brixton.RELEASE" } }
  • 28.
    SERVICE 1 REQUEST RESPONSE SERVICE 2 SERVICE3 REQUEST RESPONSE REQUEST RESPONSE SERVICE 4 REQUEST RESPONSE “Hello from service3” “Hello from service4” “Hello from service2, response from service3 [Hello from service3] and from service4 [Hello from service4]”
  • 29.
    Marcin Grzejszczak @mgrzejszczak,21 May 2016 Log correlation with Spring Cloud Sleuth DEMO
  • 30.
  • 31.
  • 32.
  • 33.
    Marcin Grzejszczak @mgrzejszczak,21 May 2016 Great! We’ve found the exception! But meanwhile....
  • 34.
    Marcin Grzejszczak @mgrzejszczak,21 May 2016 The system is slow... CLICK 200
  • 35.
    Marcin Grzejszczak @mgrzejszczak,21 May 2016 One of the services is slow?
  • 36.
    Marcin Grzejszczak @mgrzejszczak,21 May 2016 Which one? How to measure that?
  • 37.
    Marcin Grzejszczak @mgrzejszczak,21 May 2016 ● Client Sent (CS) - The client has made a request ● Server Received (SR) - The server side got the request and will start processing ● Server Send (SS) - Annotated upon completion of request processing ● Client Received (CR) - The client has successfully received the response from the server side Let’s log events!
  • 38.
    Marcin Grzejszczak @mgrzejszczak,21 May 2016 CS 0 ms SR 100 ms SS 200 msCR 300 ms
  • 39.
    Marcin Grzejszczak @mgrzejszczak,21 May 2016 ● The request started at T=0ms ● It took 300 ms for the client to receive a response ● Server side received the request at T=100 ms ● The request got processed on the server side in 100 ms Conclusions CS 0 ms SR 100 ms SS 200 msCR 300 ms
  • 40.
    Marcin Grzejszczak @mgrzejszczak,21 May 2016 Why is there a delay between sending and receiving messages?!!11!one!?!1! Conclusions CS 0 ms SR 100 ms SS 200 msCR 300 ms
  • 41.
    Marcin Grzejszczak @mgrzejszczak,21 May 2016 https://blogs.oracle.com/jag/resource/Fallacies.html
  • 42.
    Marcin Grzejszczak @mgrzejszczak,21 May 2016 Distributed tracing - terminology Span Trace Logs (annotations) Tags (binary annotations)
  • 43.
    Marcin Grzejszczak @mgrzejszczak,21 May 2016 Logs Represents an event in time associated with a span ● Every span has zero or more logs ● Each log is a timestamped event name ● Event should be the stable name of some notable moment in the lifetime of a span ● For instance, a span representing a browser page load might add an event for each of the Performance.timing moments (check https://developer.mozilla. org/en-US/docs/Web/API/PerformanceTiming)
  • 44.
    Marcin Grzejszczak @mgrzejszczak,21 May 2016 Main logs ● Client Send (CS) ○ The client has made a request - the span was started ● Server Received (SR) ○ The server side got the request and will start processing it ○ SR timestamp - CS timestamp = NETWORK LATENCY CS 0 ms SR 100 ms
  • 45.
    Marcin Grzejszczak @mgrzejszczak,21 May 2016 Main logs ● Server Send (SS) ○ Annotated upon completion of request processing ○ SS timestamp - SR timestamp = SERVER SIDE PROCESSING TIME ● Client Received (CR) ○ The client has successfully received the response from the server side ○ CR timestamp - CS timestamp = TIME NEEDED TO RECEIVE RESPONSE ○ CR timestamp - SS timestamp = NETWORK LATENCY CS 0 ms SR 100 ms SS 200 msCR 300 ms
  • 46.
    Marcin Grzejszczak @mgrzejszczak,21 May 2016 Key-value pair ● Every span may have zero or more key/value Tags ● They do not have timestamps and simply annotate the spans. ● Example of default tags in Sleuth ○ message/payload-size ○ http.method ○ commandKey for Hystrix Tag
  • 47.
    Marcin Grzejszczak @mgrzejszczak,21 May 2016 How to visualise latency in a distributed system?
  • 48.
    Marcin Grzejszczak @mgrzejszczak,21 May 2016 ● Zipkin is a distributed tracing system ● It runs as a separate process (you can run it as a Spring Boot application) ● It helps gather timing data needed to troubleshoot latency problems in microservice architectures ● The front end is a "waterfall" style graph of service calls showing call durations as horizontal bars The answer is: Zipkin
  • 49.
    Marcin Grzejszczak @mgrzejszczak,21 May 2016 How does Zipkin work? SPANS SENT TO COLLECTORS SPANS SENT TO COLLECTORS STORE IN DB APP APP UI QUERIES FOR TRACE INFO VIA API
  • 50.
    Marcin Grzejszczak @mgrzejszczak,21 May 2016 Spring Cloud Sleuth and Zipkin integration ● We take care of passing tracing information between threads / libraries / contexts ● Upon closing of a Span we will send it to Zipkin ○ either via HTTP (spring-cloud-sleuth-zipkin) ○ or via Spring Cloud Stream (spring-cloud-sleuth-stream) ● You can run Zipkin Sping Cloud Stream Collector as a Spring Boot app (spring- cloud-sleuth-zipkin-stream) ○ you can add the dependency to Zipkin UI!
  • 51.
    Marcin Grzejszczak @mgrzejszczak,21 May 2016 Spring Cloud Sleuth Zipkin with Maven <dependencyManagement> <dependencies> <dependency> <groupId>org.springframework.cloud</groupId> <artifactId>spring-cloud-dependencies</artifactId> <version>Brixton.RELEASE</version> <type>pom</type> <scope>import</scope> </dependency> </dependencies> </dependencyManagement> <dependency> <groupId>org.springframework.cloud</groupId> <artifactId>spring-cloud-starter-zipkin</artifactId> </dependency>
  • 52.
    Marcin Grzejszczak @mgrzejszczak,21 May 2016 Spring Cloud Sleuth Zipkin with Gradle Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016 dependencies { compile "org.springframework.cloud:spring-cloud-starter-zipkin" } dependencyManagement { imports { mavenBom "org.springframework.cloud:spring-cloud-dependencies:Brixton.RELEASE" } }
  • 53.
    Marcin Grzejszczak @mgrzejszczak,21 May 2016 HOLD IT! ● If I have billion services that emit gazillion spans - won’t I kill Zipkin?
  • 54.
    Marcin Grzejszczak @mgrzejszczak,21 May 2016 Sampling to the rescue! ● By default Spring Cloud Sleuth sends only 10% of requests to Zipkin ● You can change that by changing the property spring.sleuth.sampler.percentage (for 100% pass 1.0) ● Or register a custom org.springframework.cloud.sleuth.Sampler implementation
  • 55.
    SERVICE 1 /start REQUEST RESPONSE SERVICE 2 /foo SERVICE3 /barREQUEST RESPONSE REQUEST RESPONSE SERVICE 4 /baz REQUEST RESPONSE CYBERCOM SERVICE /cybercom REQUEST RESPONSE
  • 56.
  • 57.
    Marcin Grzejszczak @mgrzejszczak,21 May 2016 Traced call
  • 58.
    Marcin Grzejszczak @mgrzejszczak,21 May 2016 Traced call TOTAL DURATION END START
  • 59.
    Marcin Grzejszczak @mgrzejszczak,21 May 2016 Traced call CLIENT SENT CLIENT RECEIVED SERVICE 2CLIENT
  • 60.
    Marcin Grzejszczak @mgrzejszczak,21 May 2016 Traced call SERVER RECEIVED SERVER SENT SERVICE 4SERVER
  • 61.
    Marcin Grzejszczak @mgrzejszczak,21 May 2016 Zipkin for Brewery ● A test app for Spring Cloud end to end tests ● Source code: https://github.com/spring-cloud-samples/brewery ● Around 10 applications involved
  • 62.
  • 63.
  • 64.
    Marcin Grzejszczak @mgrzejszczak,21 May 2016 Summary ● Log correlation allows you to match logs for a given trace ● Distributed tracing allows you to quickly see latency issues in your system ● Zipkin is a great tool to visualize the latency graph and system dependencies ● Spring Cloud Sleuth integrates with Zipkin and grants you log correlation
  • 65.
  • 66.
    Marcin Grzejszczak @mgrzejszczak,21 May 2016 THANK YOU ● https://github.com/marcingrzejszczak/vagrant-elk-box/tree/presentation - code for this presentation (clone and run getReadyForConference.sh - NOTE: you need Vagrant!) ● https://github.com/spring-cloud/spring-cloud-sleuth - Spring Cloud Sleuth repository ● http://cloud.spring.io/spring-cloud-sleuth/spring-cloud-sleuth.html - Sleuth’s documentation ● http://toomuchcoding.com/blog/2016/03/25/spring-cloud-sleuth-rc1-deployed/ - article about RC1 release ● https://github.com/openzipkin/zipkin-java - Repo with Spring Boot Zipkin server ● http://docssleuth-service1.cfapps.io/start - The service1 app from this presentation deployed to Pivotal Cloud Foundry - point of entry to the app ● http://docssleuth-zipkin-server.cfapps.io/ - Zipkin deployed to Pivotal Cloud Foundry ● http://brewery-zipkin-web.cfapps.io - Zipkin deployed to PCF for Brewery Sample app