Microservices tracing with
Spring Cloud and Zipkin
Marcin Grzejszczak
Marcin Grzejszczak @mgrzejszczak, 11-13 May 2016
About me
Developer at Pivotal
Part of Spring Cloud Team
Working with OSS:
● Accurest - Consumer Driven Contracts verifier for Java
● JSON Assert - fluent JSON assertions
● Spock Subjects Collaborators Extension
● Gradle Test Profiler
● Up To Date Gradle Plugin
TWITTER: @MGrzejszczak
BLOG: http://TOOMUCHCODING.COM
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
Agenda
What is distributed tracing?
How to correlate logs with Spring Cloud Sleuth?
How to visualize latency with Spring Cloud Sleuth and Zipkin?
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
An ordinary system...
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
UI calls backend
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
UI -> BACKEND
Everything is awesome
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
CLICK 200
Until it’s not
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
CLICK 500
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
Time to debug
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
https://tonysbologna.files.wordpress.com/2015/09/mario-and-luigi.jpg?w=468&h=578&crop=1
It doesn’t look like this
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
More like this
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
On which server / instance
was the exception thrown?
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
SSH and grep for ERROR to find it?
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
Distributed tracing - terminology
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
Span
Trace
Logs (annotations)
Tags (binary annotations)
Distributed tracing - terminology
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
Span
Trace
Logs (annotations)
Tags (binary annotations)
Span
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
The basic unit of work (e.g. sending RPC)
● Spans are started and stopped
● They keep track of their timing information
● Once you create a span, you must stop it at some point in the future
● Has a parent and can have multiple children
Trace
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
A set of spans forming a tree-like structure.
● For example, if you are running a book store then
○ Trace could be retriving a list of available books
○ Assuming that to retrive the books you have to send 3 requests to 3 services
then you could have at least 3 spans (1 for each hop) forming 1 trace
SERVICE 1
REQUEST
No Trace Id
No Span Id
RESPONSE
SERVICE 2
SERVICE 3
Trace Id = X
Span Id = A
Trace Id = X
Span Id = A
Trace Id = X
Span Id = A
REQUEST
RESPONSE
Trace Id = X
Span Id = B
Client Sent
Trace Id = X
Span Id = B
Client Received
Trace Id = X
Span Id = B
Server Received
Trace Id = X
Span Id = C
Trace Id = X
Span Id = B
Server Sent
REQUEST
RESPONSE
Trace Id = X
Span Id = D
Client Sent
Trace Id = X
Span Id = D
Client Received
Trace Id = X
Span Id = D
Server Received
Trace Id = X
Span Id = E
Trace Id = X
Span Id = D
Server Sent
Trace Id = X
Span Id = E
SERVICE 4
REQUEST
RESPONSE
Trace Id = X
Span Id = F
Client Sent
Trace Id = X
Span Id = F
Client Received
Trace Id = X
Span Id = F
Server Received
Trace Id = X
Span Id = G
Trace Id = X
Span Id = F
Server Sent
Trace Id = X
Span Id = G
Trace Id = X
Span Id = C
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
Span Id = A
Parent Id = null
Span Id = B
Parent Id = A
Span Id = C
Parent Id = B
Span Id = D
Parent Id = C
Span Id = E
Parent Id = D
Span Id = F
Parent Id = C
Span Id = G
Parent Id = F
Is it that simple?
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
Is it that simple?
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
How do you pass tracing information (incl. Trace ID) between:
● different libraries?
● thread pools?
● asynchronous communication?
● …?
Log correlation with Spring Cloud Sleuth
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
We take care of passing tracing information between threads / libraries / contexts for
● Hystrix
● RxJava
● Rest Template
● Feign
● Messaging with Spring Integration
● Zuul
● ...
If you don’t do anything unexpected there’s nothing you need to do to make
Sleuth work. Check the docs for more info.
Now let’s aggregate the logs!
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
Instead of SSHing to the machines aggregate the logs!
● With Cloud Foundry’s (CF) Loggergator the logs from different instances are
streamed into a single place
● You can harvest your logs with Logstash Forwarder / FileBeat
● You can use ELK stack to stream and visualize the logs
Spring Cloud Sleuth with Maven
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
<dependencyManagement>
<dependencies>
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-dependencies</artifactId>
<version>Brixton.RELEASE</version>
<type>pom</type>
<scope>import</scope>
</dependency>
</dependencies>
</dependencyManagement>
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-starter-sleuth</artifactId>
</dependency>
Spring Cloud Sleuth with Gradle
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
dependencies {
compile "org.springframework.cloud:spring-cloud-starter-sleuth"
}
dependencyManagement {
imports {
mavenBom "org.springframework.cloud:spring-cloud-dependencies:Brixton.
RELEASE"
}
}
Log correlation with Spring Cloud Sleuth
DEMO
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
Great! We’ve found the exception!
But meanwhile....
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
The system is slow...
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
CLICK 200
One of the services is slow?
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
Which one?
How to measure that?
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
● Client Sent (CS) - The client has made a request
● Server Received (SR) - The server side got the request and will start processing it
● Server Send (SS) - Annotated upon completion of request processing
● Client Received (CR) - The client has successfully received the response from the
server side
Let’s log events!
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
CS 0 ms SR 100 ms
SS 200 msCR 300 ms
● The request started at T=0ms
● It took 300 ms for the client to receive a response
● Server side received the request at T=100 ms
● The request got processed on the server side in 100 ms
● Why is there a delay between sending and receiving messages?
Conclusions
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
CS 0 ms SR 100 ms
SS 200 msCR 300 ms
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
https://blogs.oracle.com/jag/resource/Fallacies.html
Distributed tracing - terminology
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
Span
Trace
Logs (annotations)
Tags (binary annotations)
Logs
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
Represents an event in time associated with a span
● Every span has zero or more logs
● Each log is a timestamped event name
● Event should be the stable name of some notable moment in the lifetime of a
span
● For instance, a span representing a browser page load might add an event for
each of the Performance.timing moments (check https://developer.mozilla.
org/en-US/docs/Web/API/PerformanceTiming)
Main logs
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
● Client Send (CS)
○ The client has made a request - the span was started
● Server Received (SR)
○ The server side got the request and will start processing it
○ SR timestamp - CS timestamp = NETWORK LATENCY
Main logs
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
● Server Send (SS)
○ Annotated upon completion of request processing
○ SS timestamp - SR timestamp = SERVER SIDE PROCESSING TIME
● Client Received (CR)
○ The client has successfully received the response from the server side
○ CR timestamp - CS timestamp = TIME NEEDED TO RECEIVE RESPONSE
○ SS timestamp - CR timestamp = NETWORK LATENCY
Key-value pair
● Every span may also have zero or more key/value Tags
● They do not have timestamps and simply annotate the spans.
● Example of default tags in Sleuth
○ message/payload-size
○ http.method
○ commandKey for Hystrix
Tag
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
How to visualise latency in a
distributed system?
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
● Zipkin is a distributed tracing system
● It runs as a separate process (you can run it as a Spring Boot
application)
● It helps gather timing data needed to troubleshoot latency problems in
microservice architectures
● The front end is a "waterfall" style graph of service calls showing call durations
as horizontal bars
The answer is: Zipkin
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
How does Zipkin work?
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
SPANS SENT TO
COLLECTORS
SPANS SENT TO
COLLECTORS
STORE
IN DB
APP
APP
UI QUERIES
FOR TRACE
INFO VIA API
Spring Cloud Sleuth and Zipkin integration
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
● We take care of passing tracing information between threads / libraries /
contexts
● Upon closing of a Span we will send it to Zipkin
○ either via HTTP (spring-cloud-sleuth-zipkin)
○ or via Spring Cloud Stream (spring-cloud-sleuth-stream)
● You can run Zipkin Sping Cloud Stream Collector as a Spring Boot app (spring-
cloud-sleuth-zipkin-stream)
○ you can add the dependency to Zipkin UI!
Spring Cloud Sleuth Zipkin with Maven
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
<dependencyManagement>
<dependencies>
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-dependencies</artifactId>
<version>Brixton.RELEASE</version>
<type>pom</type>
<scope>import</scope>
</dependency>
</dependencies>
</dependencyManagement>
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-starter-zipkin</artifactId>
</dependency>
Spring Cloud Sleuth Zipkin with Gradle
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
dependencies {
compile "org.springframework.cloud:spring-cloud-starter-zipkin"
}
dependencyManagement {
imports {
mavenBom "org.springframework.cloud:spring-cloud-dependencies:Brixton.
RELEASE"
}
}
SERVICE 1
/start
REQUEST
No Trace Id
No Span Id
RESPONSE
SERVICE 2
/foo
SERVICE 3
/bar
Trace Id = X
Span Id = A
Trace Id = X
Span Id = A
Trace Id = X
Span Id = A
REQUEST
RESPONSE
Trace Id = X
Span Id = B
Client Sent
Trace Id = X
Span Id = B
Client Received
Trace Id = X
Span Id = B
Server Received
Trace Id = X
Span Id = C
Trace Id = X
Span Id = B
Server Sent
REQUEST
RESPONSE
Trace Id = X
Span Id = D
Client Sent
Trace Id = X
Span Id = D
Client Received
Trace Id = X
Span Id = D
Server Received
Trace Id = X
Span Id = E
Trace Id = X
Span Id = D
Server Sent
Trace Id = X
Span Id = E
SERVICE 4
/baz
REQUEST
RESPONSE
Trace Id = X
Span Id = F
Client Sent
Trace Id = X
Span Id = F
Client Received
Trace Id = X
Span Id = F
Server Received
Trace Id = X
Span Id = G
Trace Id = X
Span Id = F
Server Sent
Trace Id = X
Span Id = G
Trace Id = X
Span Id = C
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
DEMO
Zipkin for Brewery
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
● A test app for Spring Cloud end to end tests
● Source code: https://github.com/spring-cloud-samples/brewery
● Around 10 applications involved
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
Summary
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
● Log correlation allows you to match logs for a given trace
● Distributed tracing allows you to quickly see latency issues in your system
● Zipkin is a great tool to visualize the latency graph and system dependencies
● Spring Cloud Sleuth integrates with Zipkin and grants you log correlation
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
THANK YOU
● https://github.com/marcingrzejszczak/vagrant-elk-box/tree/presentation - code for this presentation (clone
and run getReadyForConference.sh - NOTE: you need Vagrant!)
● https://github.com/spring-cloud/spring-cloud-sleuth - Spring Cloud Sleuth repository
● http://cloud.spring.io/spring-cloud-sleuth/spring-cloud-sleuth.html - Sleuth’s documentation
● http://toomuchcoding.com/blog/2016/03/25/spring-cloud-sleuth-rc1-deployed/ - article about RC1 release
● https://github.com/openzipkin/zipkin-java - Repo with Spring Boot Zipkin server
● http://docssleuth-service1.cfapps.io/start - The service1 app from this presentation deployed to Pivotal Cloud
Foundry - point of entry to the app
● http://docssleuth-zipkin-server.cfapps.io/ - Zipkin deployed to Pivotal Cloud Foundry
● http://brewery-zipkin-web.cfapps.io - Zipkin deployed to PCF for Brewery Sample app
Marcin Grzejszczak, @mgrzejszczak Kraków, 11-13 May 2016

Microservices Tracing with Spring Cloud and Zipkin

  • 1.
    Microservices tracing with SpringCloud and Zipkin Marcin Grzejszczak Marcin Grzejszczak @mgrzejszczak, 11-13 May 2016
  • 2.
    About me Developer atPivotal Part of Spring Cloud Team Working with OSS: ● Accurest - Consumer Driven Contracts verifier for Java ● JSON Assert - fluent JSON assertions ● Spock Subjects Collaborators Extension ● Gradle Test Profiler ● Up To Date Gradle Plugin TWITTER: @MGrzejszczak BLOG: http://TOOMUCHCODING.COM Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
  • 3.
    Marcin Grzejszczak @mgrzejszczak,Kraków, 11-13 May 2016
  • 4.
    Agenda What is distributedtracing? How to correlate logs with Spring Cloud Sleuth? How to visualize latency with Spring Cloud Sleuth and Zipkin? Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
  • 5.
    An ordinary system... MarcinGrzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
  • 6.
    UI calls backend MarcinGrzejszczak @mgrzejszczak, Kraków, 11-13 May 2016 UI -> BACKEND
  • 7.
    Everything is awesome MarcinGrzejszczak @mgrzejszczak, Kraków, 11-13 May 2016 CLICK 200
  • 8.
    Until it’s not MarcinGrzejszczak @mgrzejszczak, Kraków, 11-13 May 2016 CLICK 500
  • 9.
    Marcin Grzejszczak @mgrzejszczak,Kraków, 11-13 May 2016
  • 10.
    Time to debug MarcinGrzejszczak @mgrzejszczak, Kraków, 11-13 May 2016 https://tonysbologna.files.wordpress.com/2015/09/mario-and-luigi.jpg?w=468&h=578&crop=1
  • 11.
    It doesn’t looklike this Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
  • 12.
    More like this MarcinGrzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
  • 13.
    On which server/ instance was the exception thrown? Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
  • 14.
    SSH and grepfor ERROR to find it? Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
  • 15.
    Distributed tracing -terminology Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016 Span Trace Logs (annotations) Tags (binary annotations)
  • 16.
    Distributed tracing -terminology Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016 Span Trace Logs (annotations) Tags (binary annotations)
  • 17.
    Span Marcin Grzejszczak @mgrzejszczak,Kraków, 11-13 May 2016 The basic unit of work (e.g. sending RPC) ● Spans are started and stopped ● They keep track of their timing information ● Once you create a span, you must stop it at some point in the future ● Has a parent and can have multiple children
  • 18.
    Trace Marcin Grzejszczak @mgrzejszczak,Kraków, 11-13 May 2016 A set of spans forming a tree-like structure. ● For example, if you are running a book store then ○ Trace could be retriving a list of available books ○ Assuming that to retrive the books you have to send 3 requests to 3 services then you could have at least 3 spans (1 for each hop) forming 1 trace
  • 19.
    SERVICE 1 REQUEST No TraceId No Span Id RESPONSE SERVICE 2 SERVICE 3 Trace Id = X Span Id = A Trace Id = X Span Id = A Trace Id = X Span Id = A REQUEST RESPONSE Trace Id = X Span Id = B Client Sent Trace Id = X Span Id = B Client Received Trace Id = X Span Id = B Server Received Trace Id = X Span Id = C Trace Id = X Span Id = B Server Sent REQUEST RESPONSE Trace Id = X Span Id = D Client Sent Trace Id = X Span Id = D Client Received Trace Id = X Span Id = D Server Received Trace Id = X Span Id = E Trace Id = X Span Id = D Server Sent Trace Id = X Span Id = E SERVICE 4 REQUEST RESPONSE Trace Id = X Span Id = F Client Sent Trace Id = X Span Id = F Client Received Trace Id = X Span Id = F Server Received Trace Id = X Span Id = G Trace Id = X Span Id = F Server Sent Trace Id = X Span Id = G Trace Id = X Span Id = C
  • 20.
    Marcin Grzejszczak @mgrzejszczak,Kraków, 11-13 May 2016 Span Id = A Parent Id = null Span Id = B Parent Id = A Span Id = C Parent Id = B Span Id = D Parent Id = C Span Id = E Parent Id = D Span Id = F Parent Id = C Span Id = G Parent Id = F
  • 21.
    Is it thatsimple? Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
  • 22.
    Is it thatsimple? Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016 How do you pass tracing information (incl. Trace ID) between: ● different libraries? ● thread pools? ● asynchronous communication? ● …?
  • 23.
    Log correlation withSpring Cloud Sleuth Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016 We take care of passing tracing information between threads / libraries / contexts for ● Hystrix ● RxJava ● Rest Template ● Feign ● Messaging with Spring Integration ● Zuul ● ... If you don’t do anything unexpected there’s nothing you need to do to make Sleuth work. Check the docs for more info.
  • 24.
    Now let’s aggregatethe logs! Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016 Instead of SSHing to the machines aggregate the logs! ● With Cloud Foundry’s (CF) Loggergator the logs from different instances are streamed into a single place ● You can harvest your logs with Logstash Forwarder / FileBeat ● You can use ELK stack to stream and visualize the logs
  • 25.
    Spring Cloud Sleuthwith Maven Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016 <dependencyManagement> <dependencies> <dependency> <groupId>org.springframework.cloud</groupId> <artifactId>spring-cloud-dependencies</artifactId> <version>Brixton.RELEASE</version> <type>pom</type> <scope>import</scope> </dependency> </dependencies> </dependencyManagement> <dependency> <groupId>org.springframework.cloud</groupId> <artifactId>spring-cloud-starter-sleuth</artifactId> </dependency>
  • 26.
    Spring Cloud Sleuthwith Gradle Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016 dependencies { compile "org.springframework.cloud:spring-cloud-starter-sleuth" } dependencyManagement { imports { mavenBom "org.springframework.cloud:spring-cloud-dependencies:Brixton. RELEASE" } }
  • 27.
    Log correlation withSpring Cloud Sleuth DEMO Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
  • 28.
    Marcin Grzejszczak @mgrzejszczak,Kraków, 11-13 May 2016
  • 29.
    Marcin Grzejszczak @mgrzejszczak,Kraków, 11-13 May 2016
  • 30.
    Marcin Grzejszczak @mgrzejszczak,Kraków, 11-13 May 2016
  • 31.
    Great! We’ve foundthe exception! But meanwhile.... Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
  • 32.
    The system isslow... Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016 CLICK 200
  • 33.
    One of theservices is slow? Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
  • 34.
    Which one? How tomeasure that? Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
  • 35.
    ● Client Sent(CS) - The client has made a request ● Server Received (SR) - The server side got the request and will start processing it ● Server Send (SS) - Annotated upon completion of request processing ● Client Received (CR) - The client has successfully received the response from the server side Let’s log events! Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
  • 36.
    Marcin Grzejszczak @mgrzejszczak,Kraków, 11-13 May 2016 CS 0 ms SR 100 ms SS 200 msCR 300 ms
  • 37.
    ● The requeststarted at T=0ms ● It took 300 ms for the client to receive a response ● Server side received the request at T=100 ms ● The request got processed on the server side in 100 ms ● Why is there a delay between sending and receiving messages? Conclusions Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016 CS 0 ms SR 100 ms SS 200 msCR 300 ms
  • 38.
    Marcin Grzejszczak @mgrzejszczak,Kraków, 11-13 May 2016 https://blogs.oracle.com/jag/resource/Fallacies.html
  • 39.
    Distributed tracing -terminology Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016 Span Trace Logs (annotations) Tags (binary annotations)
  • 40.
    Logs Marcin Grzejszczak @mgrzejszczak,Kraków, 11-13 May 2016 Represents an event in time associated with a span ● Every span has zero or more logs ● Each log is a timestamped event name ● Event should be the stable name of some notable moment in the lifetime of a span ● For instance, a span representing a browser page load might add an event for each of the Performance.timing moments (check https://developer.mozilla. org/en-US/docs/Web/API/PerformanceTiming)
  • 41.
    Main logs Marcin Grzejszczak@mgrzejszczak, Kraków, 11-13 May 2016 ● Client Send (CS) ○ The client has made a request - the span was started ● Server Received (SR) ○ The server side got the request and will start processing it ○ SR timestamp - CS timestamp = NETWORK LATENCY
  • 42.
    Main logs Marcin Grzejszczak@mgrzejszczak, Kraków, 11-13 May 2016 ● Server Send (SS) ○ Annotated upon completion of request processing ○ SS timestamp - SR timestamp = SERVER SIDE PROCESSING TIME ● Client Received (CR) ○ The client has successfully received the response from the server side ○ CR timestamp - CS timestamp = TIME NEEDED TO RECEIVE RESPONSE ○ SS timestamp - CR timestamp = NETWORK LATENCY
  • 43.
    Key-value pair ● Everyspan may also have zero or more key/value Tags ● They do not have timestamps and simply annotate the spans. ● Example of default tags in Sleuth ○ message/payload-size ○ http.method ○ commandKey for Hystrix Tag Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
  • 44.
    How to visualiselatency in a distributed system? Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
  • 45.
    ● Zipkin isa distributed tracing system ● It runs as a separate process (you can run it as a Spring Boot application) ● It helps gather timing data needed to troubleshoot latency problems in microservice architectures ● The front end is a "waterfall" style graph of service calls showing call durations as horizontal bars The answer is: Zipkin Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
  • 46.
    How does Zipkinwork? Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016 SPANS SENT TO COLLECTORS SPANS SENT TO COLLECTORS STORE IN DB APP APP UI QUERIES FOR TRACE INFO VIA API
  • 47.
    Spring Cloud Sleuthand Zipkin integration Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016 ● We take care of passing tracing information between threads / libraries / contexts ● Upon closing of a Span we will send it to Zipkin ○ either via HTTP (spring-cloud-sleuth-zipkin) ○ or via Spring Cloud Stream (spring-cloud-sleuth-stream) ● You can run Zipkin Sping Cloud Stream Collector as a Spring Boot app (spring- cloud-sleuth-zipkin-stream) ○ you can add the dependency to Zipkin UI!
  • 48.
    Spring Cloud SleuthZipkin with Maven Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016 <dependencyManagement> <dependencies> <dependency> <groupId>org.springframework.cloud</groupId> <artifactId>spring-cloud-dependencies</artifactId> <version>Brixton.RELEASE</version> <type>pom</type> <scope>import</scope> </dependency> </dependencies> </dependencyManagement> <dependency> <groupId>org.springframework.cloud</groupId> <artifactId>spring-cloud-starter-zipkin</artifactId> </dependency>
  • 49.
    Spring Cloud SleuthZipkin with Gradle Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016 dependencies { compile "org.springframework.cloud:spring-cloud-starter-zipkin" } dependencyManagement { imports { mavenBom "org.springframework.cloud:spring-cloud-dependencies:Brixton. RELEASE" } }
  • 50.
    SERVICE 1 /start REQUEST No TraceId No Span Id RESPONSE SERVICE 2 /foo SERVICE 3 /bar Trace Id = X Span Id = A Trace Id = X Span Id = A Trace Id = X Span Id = A REQUEST RESPONSE Trace Id = X Span Id = B Client Sent Trace Id = X Span Id = B Client Received Trace Id = X Span Id = B Server Received Trace Id = X Span Id = C Trace Id = X Span Id = B Server Sent REQUEST RESPONSE Trace Id = X Span Id = D Client Sent Trace Id = X Span Id = D Client Received Trace Id = X Span Id = D Server Received Trace Id = X Span Id = E Trace Id = X Span Id = D Server Sent Trace Id = X Span Id = E SERVICE 4 /baz REQUEST RESPONSE Trace Id = X Span Id = F Client Sent Trace Id = X Span Id = F Client Received Trace Id = X Span Id = F Server Received Trace Id = X Span Id = G Trace Id = X Span Id = F Server Sent Trace Id = X Span Id = G Trace Id = X Span Id = C
  • 51.
    Marcin Grzejszczak @mgrzejszczak,Kraków, 11-13 May 2016 DEMO
  • 52.
    Zipkin for Brewery MarcinGrzejszczak @mgrzejszczak, Kraków, 11-13 May 2016 ● A test app for Spring Cloud end to end tests ● Source code: https://github.com/spring-cloud-samples/brewery ● Around 10 applications involved
  • 53.
    Marcin Grzejszczak @mgrzejszczak,Kraków, 11-13 May 2016
  • 54.
    Marcin Grzejszczak @mgrzejszczak,Kraków, 11-13 May 2016
  • 55.
    Summary Marcin Grzejszczak @mgrzejszczak,Kraków, 11-13 May 2016 ● Log correlation allows you to match logs for a given trace ● Distributed tracing allows you to quickly see latency issues in your system ● Zipkin is a great tool to visualize the latency graph and system dependencies ● Spring Cloud Sleuth integrates with Zipkin and grants you log correlation
  • 56.
    Marcin Grzejszczak @mgrzejszczak,Kraków, 11-13 May 2016
  • 57.
    THANK YOU ● https://github.com/marcingrzejszczak/vagrant-elk-box/tree/presentation- code for this presentation (clone and run getReadyForConference.sh - NOTE: you need Vagrant!) ● https://github.com/spring-cloud/spring-cloud-sleuth - Spring Cloud Sleuth repository ● http://cloud.spring.io/spring-cloud-sleuth/spring-cloud-sleuth.html - Sleuth’s documentation ● http://toomuchcoding.com/blog/2016/03/25/spring-cloud-sleuth-rc1-deployed/ - article about RC1 release ● https://github.com/openzipkin/zipkin-java - Repo with Spring Boot Zipkin server ● http://docssleuth-service1.cfapps.io/start - The service1 app from this presentation deployed to Pivotal Cloud Foundry - point of entry to the app ● http://docssleuth-zipkin-server.cfapps.io/ - Zipkin deployed to Pivotal Cloud Foundry ● http://brewery-zipkin-web.cfapps.io - Zipkin deployed to PCF for Brewery Sample app Marcin Grzejszczak, @mgrzejszczak Kraków, 11-13 May 2016