Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Microservices Tracing With Spring Cloud and Zipkin @Szczecin JUG

1,370 views

Published on

The hype related to microservices continues. It’s already common knowledge that creating distributed systems is not easy. It’s high time to show how that complexity can be contained.

Service Discovery and Registry (Zookeeper / Consul / Eureka), easy request sending with client side load balancing (Feign + Ribbon), request proxying with Zuul. Everything is easy with Spring Cloud. Just add a dependency, a couple of lines of configuration and you’re ready to go.

That’s fixing difficulties related to writing code - what about solving the complexity of debugging distributed systems? Log correlation and visualizing latency of parts of the system? Spring Cloud Sleuth with Zipkin to the rescue!

The presentation will consist of some theory but there’ll also be live coding and demos.

Published in: Technology
  • Be the first to comment

Microservices Tracing With Spring Cloud and Zipkin @Szczecin JUG

  1. 1. 1 © 2017 Pivotal Implementing Microservices Tracing with Spring Cloud and Zipkin Marcin Grzejszczak, @mgrzejszczak
  2. 2. 2 Spring Cloud developer at Pivotal Working mostly on ● Spring Cloud Sleuth ● Spring Cloud Contract ● Spring Cloud Pipelines About me Twitter: @mgrzejszczak Blog: http://toomuchcoding.com
  3. 3. 3 What is distributed tracing? How to correlate logs with Spring Cloud Sleuth? How to visualize latency with Spring Cloud Sleuth and Zipkin? Agenda
  4. 4. 4
  5. 5. 5 An ordinary system...
  6. 6. 6 UI calls backend UI -> BACKEND
  7. 7. 7 CLICK 200 Everything is awesome
  8. 8. 8 CLICK 500 Until it’s not
  9. 9. 9
  10. 10. 10 https://tonysbologna.files.wordpress.com/2015/09/mario-and-luigi.jpg?w=468&h=578&crop=1 Time to debug
  11. 11. 11 It doesn’t look like this
  12. 12. 12 More like this
  13. 13. 13 On which server / instance was the exception thrown?
  14. 14. 14 SSH and grep for ERROR to find it?
  15. 15. 15 How to find all logs from all servers that correspond to that business action?
  16. 16. 16 The answer: distributed tracing • Span • Trace • Baggage • Logs (annotations) • Tags (binary annotations)
  17. 17. 17 The answer: distributed tracing • Span • Trace • Baggage • Logs (annotations) • Tags (binary annotations)
  18. 18. 18 Span The basic unit of work (e.g. sending RPC) • Spans are started and stopped • They keep track of their timing information • Once you create a span, you must stop it at some point in the future • Has a parent and can have multiple children • All spans have unique span ids • Spans in a single hierarchy share a trace id
  19. 19. 19 Trace A set of spans forming a tree-like structure. • For example, if you are running a bookstore then • Trace could be retrieving a list of available books • Assuming that to retrieve the books you have to send 3 requests to 3 services then you could have at least 3 spans (1 for each hop) forming 1 trace
  20. 20. 20 Baggage (from Sleuth 1.2.0) Key value pairs that get propagated between network boundaries • once set is accessible in every application for the duration of trace • works for HTTP and messaging based communication • WARNING: if size of the baggage is too large then your latency can get greater
  21. 21. SERVICE 1 REQUEST No Trace Id No Span Id RESPONSE SERVICE 2 SERVICE 3 Trace Id = X Span Id = A Trace Id = X Span Id = A REQUEST RESPONSE Trace Id = X Span Id = B Client Send Trace Id = X Span Id = B Client Received Trace Id = X Span Id = B Server Received Trace Id = X Span Id = C Trace Id = X Span Id = B Server Sent REQUEST RESPONSE Trace Id = X Span Id = D Client Send Trace Id = X Span Id = D Client Received Trace Id = X Span Id = D Server Received Trace Id = X Span Id = E Trace Id = X Span Id = D Server Sent Trace Id = X Span Id = E SERVICE 4 REQUEST RESPONSE Trace Id = X Span Id = F Client Send Trace Id = X Span Id = F Client Received Trace Id = X Span Id = F Server Received Trace Id = X Span Id = G Trace Id = X Span Id = F Server Sent Trace Id = X Span Id = G Trace Id = X Span Id = C
  22. 22. 22 Span Id = A Parent Id = null Span Id = B Parent Id = A Span Id = C Parent Id = B Span Id = D Parent Id = C Span Id = E Parent Id = D Span Id = F Parent Id = C Span Id = G Parent Id = F
  23. 23. 23 Is it that simple?
  24. 24. 24 Is context propagation simple? How do you pass tracing information (incl. Trace ID) between: • different libraries? • thread pools? • asynchronous communication? • …?
  25. 25. 25 Log correlation with Spring Cloud Sleuth We take care of passing tracing information between threads / libraries / contexts for: • Hystrix • RxJava • Rest Template • Feign • Messaging with Spring Integration • Zuul • ... If you don’t do anything unexpected there’s nothing you need to do to make Sleuth work. Check the docs for more info.
  26. 26. 26 Spring Cloud Sleuth logging format We set a logging format for you...
  27. 27. 27 Now let’s aggregate the logs! Instead of SSHing to the machines to grep logs lets aggregate them! • With Cloud Foundry’s (CF) Loggregator the logs from different instances are streamed into a single place • You can harvest your logs with Logstash Forwarder / FileBeat • You can use ELK stack to stream and visualize the logs
  28. 28. 28 <dependencyManagement> <dependencies> <dependency> <groupId>org.springframework.cloud</groupId> <artifactId>spring-cloud-dependencies</artifactId> <version>Camden.SR6</version> <type>pom</type> <scope>import</scope> </dependency> </dependencies> </dependencyManagement> <dependency> <groupId>org.springframework.cloud</groupId> <artifactId>spring-cloud-starter-sleuth</artifactId> </dependency> Spring Cloud Sleuth with Maven
  29. 29. 29 SERVICE 1 /start REQUEST RESPONSE SERVICE 2 SERVICE 3 REQUEST RESPONSE REQUEST RESPONSE SERVICE 4 REQUEST RESPONSE “Hello from service3” “Hello from service4” “Hello from service2, response from service3 [Hello from service3] and from service4 [Hello from service4]”
  30. 30. 30 SERVICE 1 /readtimeout REQUEST BOOM! SERVICE 2 REQUEST BOOM! REQUEST BOOM!
  31. 31. 31 Log correlation with Spring Cloud Sleuth DEMO
  32. 32. 32
  33. 33. 33
  34. 34. 34
  35. 35. 35 Great! We’ve found the exception! But meanwhile....
  36. 36. 36 CLICK 200 The system is slow...
  37. 37. 37 One of the services is slow...
  38. 38. 38 Which one? How to measure that?
  39. 39. 39 ● Client Send (CS) - The client has made a request ● Server Received (SR) - The server side got the request and will start processing ● Server Send (SS) - Annotated upon completion of request processing ● Client Received (CR) - The client has successfully received the response from the server side Let’s log events!
  40. 40. 40 CS 0 ms SR 100 ms SS 300 msCR 450 ms
  41. 41. 41 ● The request started at T=0ms ● It took 450 ms for the client to receive a response ● Server side received the request at T=100 ms ● The request got processed on the server side in 200 ms Conclusions CS 0 ms SR 100 ms SS 300 msCR 450 ms
  42. 42. 42 Why is there a delay between sending and receiving messages?!!11!one!?!1! Conclusions CS 0 ms SR 100 ms SS 300 msCR 450 ms
  43. 43. 43 https://blogs.oracle.com/jag/resource/Fallacies.html
  44. 44. 44 Distributed tracing - terminology • Span • Trace • Baggage • Logs (annotations) • Tags (binary annotations)
  45. 45. 45 Logs Represents an event in time associated with a span ● Every span has zero or more logs ● Each log is a timestamped event name ● Event should be the stable name of some notable moment in the lifetime of a span ○ For instance, a span representing a browser page load might add an event for each of the Performance.timing moments (check https://developer.mozilla.org/en-US/docs/Web/API/PerformanceTiming)
  46. 46. 46
  47. 47. 47 Main logs ● Client Send (CS) ○ The client has made a request - the span was started ● Server Received (SR) ○ The server side got the request and will start processing it ○ SR timestamp - CS timestamp = NETWORK LATENCY CS 0 ms SR 100 ms
  48. 48. 48 Main logs ● Server Send (SS) ○ Annotated upon completion of request processing ○ SS timestamp - SR timestamp = SERVER SIDE PROCESSING TIME ● Client Received (CR) ○ The client has successfully received the response from the server side ○ CR timestamp - CS timestamp = TIME NEEDED TO RECEIVE RESPONSE ○ CR timestamp - SS timestamp = NETWORK LATENCY CS 0 ms SR 100 ms SS 300 msCR 450 ms
  49. 49. 49 Key-value pair ● Every span may have zero or more key/value Tags ● They do not have timestamps and simply annotate the spans. ● Example of default tags in Sleuth ○ message/payload-size ○ http.method ○ commandKey for Hystrix Tag
  50. 50. 50 How to visualise latency in a distributed system?
  51. 51. 51 The answer is: ZIPKIN
  52. 52. 52 SPANS SENT TO COLLECTORS SPANS SENT TO COLLECTORS STORE IN DB APP APP UI QUERIES FOR TRACE INFO VIA API How does Zipkin work?
  53. 53. 53 How does Zipkin look like?
  54. 54. 54 Spring Cloud Sleuth and Zipkin integration ● We take care of passing tracing information between threads / libraries / contexts ● Upon closing of a Span we will send it to Zipkin ○ either via HTTP (spring-cloud-sleuth-zipkin) ○ or via Spring Cloud Stream (spring-cloud-sleuth-stream) ● You can run Zipkin Spring Cloud Stream Collector as a Spring Boot app (spring-cloud-sleuth-zipkin-stream) ○ you can add the dependency to Zipkin UI!
  55. 55. 55 Spring Cloud Sleuth Zipkin with Maven <dependencyManagement> <dependencies> <dependency> <groupId>org.springframework.cloud</groupId> <artifactId>spring-cloud-dependencies</artifactId> <version>Camden.SR6</version> <type>pom</type> <scope>import</scope> </dependency> </dependencies> </dependencyManagement> <dependency> <groupId>org.springframework.cloud</groupId> <artifactId>spring-cloud-starter-zipkin</artifactId> </dependency>
  56. 56. 56 Hold it! If I have billion services that emit gazillion spans - won’t I kill Zipkin?
  57. 57. 57 Sampling to the rescue! ● By default Spring Cloud Sleuth sends only 10% of requests to Zipkin ● You can change that by changing the property spring.sleuth.sampler.percentage (for 100% pass 1.0) ● Or register a custom org.springframework.cloud.sleuth.Sampler implementation
  58. 58. 58 SERVICE 1 /start REQUEST RESPONSE SERVICE 2 /foo SERVICE 3 /barREQUEST RESPONSE REQUEST RESPONSE SERVICE 4 /baz REQUEST RESPONSE SZCZECIN SERVICE /szczecin REQUEST RESPONSE
  59. 59. 59 DEMO
  60. 60. 60 Traced call
  61. 61. 61 Traced call 1st request Service1 calling Service2 Service2 calling Service3 Service2 calling Service4
  62. 62. 62 Traced call RPC call Tags Events
  63. 63. 63 Traced call - error click!
  64. 64. 64 Traced call - error
  65. 65. 65 Baggage Setting a baggage item Retrieving a baggage item
  66. 66. 66 Manipulating spans via annotations (from Sleuth 1.2.0) New span Continue span
  67. 67. 67 ● Log correlation allows you to match logs for a given trace ● Distributed tracing allows you to quickly see latency issues in your system ● Zipkin is a great tool to visualize the latency graph and system dependencies ● Spring Cloud Sleuth integrates with Zipkin and grants you log correlation ● With 1.2.0 you’ll be able to propagate any information via baggage ● With 1.2.0 you’ll be able to use annotations to create / continue spans and add logs and tags Summary
  68. 68. 68 ● A test app for Spring Cloud end to end tests ● Source code: https://github.com/spring-cloud-samples/brewery ● Around 10 applications involved ● Zipkin deployed to PCF for Brewery Sample app: http://docsbrewing-zipkin-server.cfapps.io Zipkin for Brewery
  69. 69. 69 Zipkin for Brewery
  70. 70. 70 Zipkin for Brewery
  71. 71. 71
  72. 72. 72 ▪ Code for this presentation (clone and run getReadyForConference.sh - NOTE: you need Vagrant!) : https://github.com/marcingrzejszczak/vagrant-elk-box/tree/presentation ▪ Sleuth samples: https://github.com/spring-cloud-samples/sleuth-documentation-apps ▪ Sleuth’s documentation: http://cloud.spring.io/spring-cloud-sleuth/ ▪ Repo with Spring Boot Zipkin server: https://github.com/openzipkin/zipkin-java ▪ Zipkin deployed to PCF for Brewery Sample app: http://docsbrewing-zipkin-server.cfapps.io ▪ Pivotal Web Services trial : https://run.pivotal.io/ ▪ PCF on your laptop : https://docs.pivotal.io/pcf-dev/ Links
  73. 73. 73 Learn More. Stay Connected. ▪ Read the docs ▪ Check the samples ▪ Talk to us on Gitter Twitter: twitter.com/springcentral YouTube: spring.io/video LinkedIn: spring.io/linkedin Google Plus: spring.io/gplus
  74. 74. 74 mgrzejszczak

×