Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Microservices Tracing with Spring Cloud and Zipkin (devoxx)

1,425 views

Published on

Presentation from the Devoxx PL conference

Published in: Technology
  • Be the first to comment

Microservices Tracing with Spring Cloud and Zipkin (devoxx)

  1. 1. Microservices tracing with Spring Cloud and Zipkin Marcin Grzejszczak Marcin Grzejszczak @mgrzejszczak, 24 June 2016
  2. 2. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 About me Developer at Pivotal Part of Spring Cloud Team Working with OSS: ● Accurest - Consumer Driven Contracts verifier for Java ● JSON Assert - fluent JSON assertions ● Spock Subjects Collaborators Extension ● Gradle Test Profiler ● Up To Date Gradle Plugin TWITTER: @MGrzejszczak BLOG: http://TOOMUCHCODING.COM
  3. 3. Marcin Grzejszczak @mgrzejszczak, 24 June 2016
  4. 4. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 Agenda What is distributed tracing? How to correlate logs with Spring Cloud Sleuth? How to visualize latency with Spring Cloud Sleuth and Zipkin?
  5. 5. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 An ordinary system...
  6. 6. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 UI calls backend UI -> BACKEND
  7. 7. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 Everything is awesome CLICK 200
  8. 8. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 Until it’s not CLICK 500
  9. 9. Marcin Grzejszczak @mgrzejszczak, 24 June 2016
  10. 10. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 Time to debug https://tonysbologna.files.wordpress.com/2015/09/mario-and-luigi.jpg?w=468&h=578&crop=1
  11. 11. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 It doesn’t look like this
  12. 12. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 More like this
  13. 13. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 On which server / instance was the exception thrown?
  14. 14. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 SSH and grep for ERROR to find it?
  15. 15. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 Distributed tracing - terminology Span Trace Logs (annotations) Tags (binary annotations)
  16. 16. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 Distributed tracing - terminology Span Trace Logs (annotations) Tags (binary annotations)
  17. 17. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 Span The basic unit of work (e.g. sending RPC) ● Spans are started and stopped ● They keep track of their timing information ● Once you create a span, you must stop it at some point in the future ● Has a parent and can have multiple children
  18. 18. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 Trace A set of spans forming a tree-like structure. ● For example, if you are running a book store then ○ Trace could be retriving a list of available books ○ Assuming that to retrive the books you have to send 3 requests to 3 services then you could have at least 3 spans (1 for each hop) forming 1 trace
  19. 19. SERVICE 1 REQUEST No Trace Id No Span Id RESPONSE SERVICE 2 SERVICE 3 Trace Id = X Span Id = A Trace Id = X Span Id = A Trace Id = X Span Id = A REQUEST RESPONSE Trace Id = X Span Id = B Client Send Trace Id = X Span Id = B Client Received Trace Id = X Span Id = B Server Received Trace Id = X Span Id = C Trace Id = X Span Id = B Server Sent REQUEST RESPONSE Trace Id = X Span Id = D Client Send Trace Id = X Span Id = D Client Received Trace Id = X Span Id = D Server Received Trace Id = X Span Id = E Trace Id = X Span Id = D Server Sent Trace Id = X Span Id = E SERVICE 4 REQUEST RESPONSE Trace Id = X Span Id = F Client Send Trace Id = X Span Id = F Client Received Trace Id = X Span Id = F Server Received Trace Id = X Span Id = G Trace Id = X Span Id = F Server Sent Trace Id = X Span Id = G Trace Id = X Span Id = C
  20. 20. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 Span Id = A Parent Id = null Span Id = B Parent Id = A Span Id = C Parent Id = B Span Id = D Parent Id = C Span Id = E Parent Id = D Span Id = F Parent Id = C Span Id = G Parent Id = F
  21. 21. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 Is it that simple?
  22. 22. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 Is it that simple? How do you pass tracing information (incl. Trace ID) between: ● different libraries? ● thread pools? ● asynchronous communication? ● …?
  23. 23. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 What if you forget about a thread pool? SERVICE 1 REQUEST NO TRACE RESPONSE SERVICE 2 SERVICE 3 A A A REQUEST RESPONSE A A A B A REQUEST RESPONSE B B C C C C SERVICE 4 REQUEST RESPONSE B B D D D D B
  24. 24. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 Log correlation with Spring Cloud Sleuth We take care of passing tracing information between threads / libraries / contexts for ● Hystrix ● RxJava ● Rest Template ● Feign ● Messaging with Spring Integration ● Zuul ● ... If you don’t do anything unexpected there’s nothing you need to do to make Sleuth work. Check the docs for more info.
  25. 25. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 Now let’s aggregate the logs! Instead of SSHing to the machines aggregate the logs! ● With Cloud Foundry’s (CF) Loggergator the logs from different instances are streamed into a single place ● You can harvest your logs with Logstash Forwarder / FileBeat ● You can use ELK stack to stream and visualize the logs
  26. 26. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 Spring Cloud Sleuth with Maven <dependencyManagement> <dependencies> <dependency> <groupId>org.springframework.cloud</groupId> <artifactId>spring-cloud-dependencies</artifactId> <version>Brixton.SR1</version> <type>pom</type> <scope>import</scope> </dependency> </dependencies> </dependencyManagement> <dependency> <groupId>org.springframework.cloud</groupId> <artifactId>spring-cloud-starter-sleuth</artifactId> </dependency>
  27. 27. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 Spring Cloud Sleuth with Gradle dependencies { compile "org.springframework.cloud:spring-cloud-starter-sleuth" } dependencyManagement { imports { mavenBom "org.springframework.cloud:spring-cloud-dependencies:Brixton.SR1" } }
  28. 28. SERVICE 1 /start REQUEST RESPONSE SERVICE 2 SERVICE 3 REQUEST RESPONSE REQUEST RESPONSE SERVICE 4 REQUEST RESPONSE “Hello from service3” “Hello from service4” “Hello from service2, response from service3 [Hello from service3] and from service4 [Hello from service4]”
  29. 29. SERVICE 1 /readtimeout REQUEST BOOM! SERVICE 2 REQUEST BOOM! REQUEST BOOM!
  30. 30. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 Log correlation with Spring Cloud Sleuth DEMO
  31. 31. Marcin Grzejszczak @mgrzejszczak, 24 June 2016
  32. 32. Marcin Grzejszczak @mgrzejszczak, 24 June 2016
  33. 33. Marcin Grzejszczak @mgrzejszczak, 24 June 2016
  34. 34. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 Great! We’ve found the exception! But meanwhile....
  35. 35. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 The system is slow... CLICK 200
  36. 36. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 One of the services is slow?
  37. 37. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 Which one? How to measure that?
  38. 38. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 ● Client Send (CS) - The client has made a request ● Server Received (SR) - The server side got the request and will start processing ● Server Send (SS) - Annotated upon completion of request processing ● Client Received (CR) - The client has successfully received the response from the server side Let’s log events!
  39. 39. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 CS 0 ms SR 100 ms SS 300 msCR 450 ms
  40. 40. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 ● The request started at T=0ms ● It took 450 ms for the client to receive a response ● Server side received the request at T=100 ms ● The request got processed on the server side in 200 ms Conclusions CS 0 ms SR 100 ms SS 300 msCR 450 ms
  41. 41. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 Why is there a delay between sending and receiving messages?!!11!one!?!1! Conclusions CS 0 ms SR 100 ms SS 300 msCR 450 ms
  42. 42. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 https://blogs.oracle.com/jag/resource/Fallacies.html
  43. 43. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 Distributed tracing - terminology Span Trace Logs (annotations) Tags (binary annotations)
  44. 44. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 Logs Represents an event in time associated with a span ● Every span has zero or more logs ● Each log is a timestamped event name ● Event should be the stable name of some notable moment in the lifetime of a span ● For instance, a span representing a browser page load might add an event for each of the Performance.timing moments (check https://developer.mozilla. org/en-US/docs/Web/API/PerformanceTiming)
  45. 45. Marcin Grzejszczak @mgrzejszczak, 24 June 2016
  46. 46. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 Main logs ● Client Send (CS) ○ The client has made a request - the span was started ● Server Received (SR) ○ The server side got the request and will start processing it ○ SR timestamp - CS timestamp = NETWORK LATENCY CS 0 ms SR 100 ms
  47. 47. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 Main logs ● Server Send (SS) ○ Annotated upon completion of request processing ○ SS timestamp - SR timestamp = SERVER SIDE PROCESSING TIME ● Client Received (CR) ○ The client has successfully received the response from the server side ○ CR timestamp - CS timestamp = TIME NEEDED TO RECEIVE RESPONSE ○ CR timestamp - SS timestamp = NETWORK LATENCY CS 0 ms SR 100 ms SS 300 msCR 450 ms
  48. 48. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 Key-value pair ● Every span may have zero or more key/value Tags ● They do not have timestamps and simply annotate the spans. ● Example of default tags in Sleuth ○ message/payload-size ○ http.method ○ commandKey for Hystrix Tag
  49. 49. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 How to visualise latency in a distributed system?
  50. 50. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 ● Zipkin is a distributed tracing system ● It runs as a separate process (you can run it as a Spring Boot application) ● It helps gather timing data needed to troubleshoot latency problems in microservice architectures ● The front end is a "waterfall" style graph of service calls showing call durations as horizontal bars The answer is: Zipkin
  51. 51. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 How does Zipkin work? SPANS SENT TO COLLECTORS SPANS SENT TO COLLECTORS STORE IN DB APP APP UI QUERIES FOR TRACE INFO VIA API
  52. 52. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 Spring Cloud Sleuth and Zipkin integration ● We take care of passing tracing information between threads / libraries / contexts ● Upon closing of a Span we will send it to Zipkin ○ either via HTTP (spring-cloud-sleuth-zipkin) ○ or via Spring Cloud Stream (spring-cloud-sleuth-stream) ● You can run Zipkin Spring Cloud Stream Collector as a Spring Boot app (spring- cloud-sleuth-zipkin-stream) ○ you can add the dependency to Zipkin UI!
  53. 53. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 Spring Cloud Sleuth Zipkin with Maven <dependencyManagement> <dependencies> <dependency> <groupId>org.springframework.cloud</groupId> <artifactId>spring-cloud-dependencies</artifactId> <version>Brixton.SR1</version> <type>pom</type> <scope>import</scope> </dependency> </dependencies> </dependencyManagement> <dependency> <groupId>org.springframework.cloud</groupId> <artifactId>spring-cloud-starter-zipkin</artifactId> </dependency>
  54. 54. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 Spring Cloud Sleuth Zipkin with Gradle dependencies { compile "org.springframework.cloud:spring-cloud-starter-zipkin" } dependencyManagement { imports { mavenBom "org.springframework.cloud:spring-cloud-dependencies:Brixton.SR1" } }
  55. 55. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 HOLD IT! ● If I have billion services that emit gazillion spans - won’t I kill Zipkin?
  56. 56. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 Sampling to the rescue! ● By default Spring Cloud Sleuth sends only 10% of requests to Zipkin ● You can change that by changing the property spring.sleuth.sampler.percentage (for 100% pass 1.0) ● Or register a custom org.springframework.cloud.sleuth.Sampler implementation
  57. 57. SERVICE 1 /start REQUEST RESPONSE SERVICE 2 /foo SERVICE 3 /barREQUEST RESPONSE REQUEST RESPONSE SERVICE 4 /baz REQUEST RESPONSE DEVOXX SERVICE /devoxx REQUEST RESPONSE
  58. 58. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 DEMO
  59. 59. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 Traced call
  60. 60. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 Traced call TOTAL DURATION END START
  61. 61. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 Traced call CLIENT SENT CLIENT RECEIVED SERVICE 2CLIENT
  62. 62. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 Traced call SERVER RECEIVED SERVER SENT SERVICE 4SERVER
  63. 63. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 Traced call LATENCY SERVER RECEIVED CLIENT SENT
  64. 64. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 Traced call SERVER RECEIVED CLIENT SENT DIFF IS LATENCY
  65. 65. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 Zipkin for Brewery ● A test app for Spring Cloud end to end tests ● Source code: https://github.com/spring-cloud-samples/brewery ● Around 10 applications involved
  66. 66. Marcin Grzejszczak @mgrzejszczak, 24 June 2016
  67. 67. Marcin Grzejszczak @mgrzejszczak, 24 June 2016
  68. 68. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 Summary ● Log correlation allows you to match logs for a given trace ● Distributed tracing allows you to quickly see latency issues in your system ● Zipkin is a great tool to visualize the latency graph and system dependencies ● Spring Cloud Sleuth integrates with Zipkin and grants you log correlation
  69. 69. Marcin Grzejszczak @mgrzejszczak, 24 June 2016
  70. 70. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 THANK YOU ● https://github.com/marcingrzejszczak/vagrant-elk-box/tree/presentation - code for this presentation (clone and run getReadyForConference.sh - NOTE: you need Vagrant!) ● https://github.com/spring-cloud/spring-cloud-sleuth - Spring Cloud Sleuth repository ● http://cloud.spring.io/spring-cloud-sleuth/spring-cloud-sleuth.html - Sleuth’s documentation ● http://toomuchcoding.com/blog/2016/03/25/spring-cloud-sleuth-rc1-deployed/ - article about RC1 release ● https://github.com/openzipkin/zipkin-java - Repo with Spring Boot Zipkin server ● http://docssleuth-service1.cfapps.io/start - The service1 app from this presentation deployed to Pivotal Cloud Foundry - point of entry to the app ● http://docssleuth-zipkin-server.cfapps.io/ - Zipkin deployed to Pivotal Cloud Foundry ● http://docsbrewing-zipkin-server.cfapps.io - Zipkin deployed to PCF for Brewery Sample app

×