Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Introducing CallTracing (tm), based on RabbitMQ, Spring and Zipkin

2,007 views

Published on

Recorded at SpringOne2GX 2015
Presenter: Monish Unni
Data / Integration Track

Do you live in a world where StackTraces aren’t quite enough? There is no easy way for you to predict how a certain set of services might be called or what their usage patterns are? Does everything work in DIT/SIT/UAT/PELT until you hit production and strange things start to happen due to distribution of services? Solution: Use RabbitMQ (AMQP protocol) and spring proxies/interceptors to enable an out-of-band instrumentation to trace requests and gain deep knowledge about how certain requests perform in a distributed system. In 2014, as part of infrastructure wide changes, I introduced calltracing(tm) as a way to correlate requests from a single user across E*Trade's heterogeneous systems. This ""trace"" is then consumed by various big-data analytic tools to produce aggregated reports. Zipkin(tm) is a collector, digester and a visualization front-end for the aggregated data. in other words, it’s a distributed tracing system that can show timing data for services that are on various nodes. Zipkin manages both collection and lookup of data through a collector and a query service. In this session, i will discuss specifically how E*Trade’s disparate services are stitched together using RabbitMQ(AMQP protocol) and Spring Proxies to form the enablement tier to provide data to Zipkin.

Published in: Technology
  • Be the first to comment

Introducing CallTracing (tm), based on RabbitMQ, Spring and Zipkin

  1. 1. SPRINGONE2GX WASHINGTON, DC Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Call Tracing Monish Unni monish.unni@gmail.com
  2. 2. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ About Me Monish Unni Programmer | Architect Java | Spring | Scala | NodeJS 2
  3. 3. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Contents •  Background •  Use case •  Core Concepts •  Demo •  Minimal Tracing •  Zipkin Tracing •  Demo •  Production Deployment •  Implementing suggestions •  Exceptions to the model •  Lessons Learnt 3
  4. 4. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Background •  In the beginning we had •  Clearly specified contracts and roles •  Well known topology and flow o  web servers, application servers, services, databases •  Prescheduled down times •  Stacktraces were good enough o  for the most part. 4
  5. 5. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Background •  Today we have microservices that are a •  Collection of software modules •  Developed by different teams •  Different programming languages •  Span many machines •  Multiple physical facilities 5
  6. 6. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Background •  Developers have checked in and tested code •  SRE monitors services, queues and databases •  queue surges •  database tracing •  disk corruption •  load re-balancing 6
  7. 7. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Background •  SRE needs to •  shutdown and redeploy services •  map a service to other services •  map a service to database/tables •  figure out weak links in a system •  figure out critical paths 7
  8. 8. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ 8 •  SRE needs to •  debug the system •  figure out where the bottlenecks are? •  need to determine if request paths are correct? •  correlate call tree with system behavior. Background
  9. 9. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ 9 •  In April 2010, Google published a research paper called as “Dapper” Modern Internet services are often implemented as complex, large-scale distributed systems. These applications are constructed from collections of software modules that may be developed by different teams, perhaps in different programming languages, and could span many thousands of machines across multiple physical facilities. -- Google Dapper Paper
  10. 10. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ •  Using an in-house dapper-like framework, we are able to layer the request demand through the aggregate infrastructure onto a simple visualization - Netflix Feb 20151 •  ..help us gather timing data for all the disparate services involved in managing a request to the Twitter API... - Twitter June 20122 •  Real-time distributed tracing and call graphs have proven to be indispensable for insight into the service performance of web applications, and the associated cost across the entire technology stack at LinkedIn. - LinkedIn Feb 20153 •  Talk on “Call Tracing” - SpringOne2GX Sept 2015 10 1 http://techblog.netflix.com/2015_02_01_archive.html 2 https://blog.twitter.com/2012/distributed-systems-tracing-with-zipkin 3 https://engineering.linkedin.com/distributed-service-call-graph/real-time-distributed-tracing-website-performance-and-efficiency
  11. 11. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ 11 Services ?
  12. 12. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ 12 Services (guess) . Contacts – Presence – Circles – Hangout audio – Hangout video – Labels/Tags –
  13. 13. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ 13 Presence Service
  14. 14. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ 14 A single request can run across 100s/1000s of machines/services and involve hundreds of different subsystems.
  15. 15. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Call Tracing Concepts 15
  16. 16. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ 16 Contacts Presence Session
  17. 17. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ 17 Contacts Presence Session
  18. 18. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ 18 Contacts Presence Session TraceId or GUID
  19. 19. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ 19 Contacts Presence Session TraceId or GUID
  20. 20. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ 20 Contacts Presence Session SpanId or RPC id
  21. 21. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Review •  TraceId •  a globally unique immutable Id •  correlates to a request •  SpanId •  a unique id for an RPC 21
  22. 22. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Demo •  Start two services •  server_a, server_b •  client calls server_a •  server_a calls server_b •  show trace UX 22
  23. 23. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ 23 Contacts Presence Session node-1 node-2 node-3 Goal Stitch call tree from local logs Node Log View node-1: tid node-2: node-3: tid
  24. 24. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ 24 Contacts Presence Session tid spid-1 node-1 node-2 node-3 Goal Stitch call tree from local logs Node Log View node-1: tid, spid-1 node-2: node-3:
  25. 25. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ 25 Contacts Presence Session (tid, spid-1) tid spid-1 node-1 node-2 node-3 Goal Stitch call tree from local logs Node Log View node-1: tid, spid-1 node-2: tid, spid-1 node-3:
  26. 26. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ 26 Contacts Presence Session (tid, spid-1) spid-1 spid-2 node-1 node-2 node-3 Goal Stitch call tree from local logs Node Log View node-1: tid, spid-1 node-2: tid, spid-1, spid-2 node-3: tid
  27. 27. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ 27 Contacts Presence Session (tid, spid-1) spid-1 (tid, spid-2) spid-2 node-1 node-2 node-3 Goal Stitch call tree from local logs Node Log View node-1: tid, spid-1 node-2: tid, spid-1, spid-2 node-3: tid, spid-2 tid
  28. 28. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ 28 Contacts Presence Session (tid, spid-1) spid-1 (tid, spid-2) spid-2 node-1 node-2 node-3 Goal Stitch call tree from local logs Node Log View node-1: tid, spid-1 node-2: tid, spid-1, spid-2 node-3: tid, spid-2 tid
  29. 29. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ 29 Contacts Presence Session (tid, spid-1) tid spid-1 (tid, spid-2) spid-2 spid-3 spid-4 (tid, spid-3) (tid, spid-4) Goal Stitch call tree from local logs Node Log View node-1: tid, spid-1, spid-3 node-2: tid, spid-1, spid-2, spid-3 node-3: tid, spid-2, spid-4 Limit o  can correlate all requests o  cannot stitch call tree from logs node-1 node-2 node-3
  30. 30. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Review •  Black-box instrumentation •  Pros o  Application level transparency o  TraceId, SpanId, Timestamp allow correlation of requests •  Cons o  Request Paths or call trees require o  reasoning based on statistical inference o  derivation of causal relationships 30
  31. 31. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Uniform Logging Structure •  TraceId, SpanId, ParentId, Event +Timestamp •  Four key events with timestamps •  Server Receive/Send (SR/SS) o  duration for a service call •  Client Send/Receive (CS/CR) o  duration for a client call •  ParentId •  Optional – root span has none •  Previous SpanId 31
  32. 32. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ 32 Contacts Presence Session SR, 1, 1a, n/a Structure Event: Server Received (SR) TraceId: 1 SpanId: 1a ParentId : N/A node-1 node-2
  33. 33. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ 33 Contacts Presence Session SR, 1, 1a, n/a Structure Event: Client Sent (CS) TraceId: 1 SpanId: 1b ParentId : 1a node-1 node-2 CS, 1, 1b, 1a
  34. 34. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ 34 Contacts Presence Session SR, 1, 1a, n/a Structure Event: Server Received(SR) TraceId: 1 SpanId: 1b ParentId : 1a node-1 node-2 CS, 1, 1b, 1a SR, 1, 1b, 1a
  35. 35. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ 35 Contacts Presence Session SR, 1, 1a, n/a Structure Event: Server Sent(SS) TraceId: 1 SpanId: 1b ParentId : 1a node-1 node-2 SR, 1, 1b, 1a SS, 1, 1b 1a CS, 1, 1b, 1a
  36. 36. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ 36 Contacts Presence Session SR, 1, 1a, n/a Structure Event: Client Received(CR) TraceId: 1 SpanId: 1b ParentId : 1a node-1 node-2 SS, 1, 1b 1a CR, 1, 1b, 1a SR, 1, 1b, 1a CS, 1, 1b, 1a
  37. 37. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ 37 Contacts Presence Session SR, 1, 1a, n/a Structure Event: Server Sent(SS) TraceId: 1 SpanId: 1a ParentId : N/A node-1 node-2 CR, 1, 1b, 1a SS, 1, 1a, n/a SS, 1, 1b 1a SR, 1, 1b, 1a CS, 1, 1b, 1a
  38. 38. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ 38 Contacts Presence Session SR, 1, 1a, n/a Span CS, SR, SS, CR Traceid : 1 SpanId : 1b ParentId : 1a node-1 node-2 CR, 1, 1b, 1a SS, 1, 1a, n/a SS, 1, 1b 1a SR, 1, 1b, 1a CS, 1, 1b, 1a
  39. 39. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Review •  TraceId •  remains unique across systems •  Span •  can identify RPC attributes •  example: IP Address between hosts •  Request/Response 39
  40. 40. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Warning! •  We are going from 0 to 100 mph very quickly 40
  41. 41. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ 41 Contacts Presence Session SR, 1, 1a, n/a Goal Stitch call tree from local logs Node Log View node-1: SR, 1, 1a, n/a node-2: node-3: node-1 node-2 node-3
  42. 42. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ 42 Contacts Presence Session SR, 1, 1a, n/a Goal Stitch call tree from local logs Node Log View node-1: SR, 1, 1a, n/a CS, 1, 1b, 1a node-2: SR, 1, 1b, 1a node-3: node-1 node-2 node-3 CS, 1, 1b, 1a SR, 1, 1b, 1a
  43. 43. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ 43 Contacts Presence Session SR, 1, 1a, n/a Goal Stitch call tree from local logs Node Log View node-1: SR, 1, 1a, n/a CS, 1, 1b, 1a node-2: SR, 1, 1b, 1a CS, 1, 1c, 1b node-3: SR, 1, 1c, 1b node-1 node-2 node-3 CS, 1, 1b, 1a SR, 1, 1b, 1a CS, 1, 1c, 1b SR, 1, 1c, 1b
  44. 44. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ 44 Contacts Presence Session SR, 1, 1a, n/a Goal Stitch call tree from local logs Node Log View node-1: SR, 1, 1a, n/a CS, 1, 1b, 1a node-2: SR, 1, 1b, 1a CS, 1, 1c, 1b CR, 1, 1c, 1b node-3: SR, 1, 1c, 1b SS, 1, 1c, 1b node-1 node-2 node-3 CS, 1, 1b, 1a SR, 1, 1b, 1a CS, 1, 1c, 1b SR, 1, 1c, 1b SS, 1, 1c, 1b CR, 1, 1c, 1b
  45. 45. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ 45 Contacts Presence Session SR, 1, 1a, n/a CS, 1, 1b, 1a SR, 1, 1b, 1a SS, 1, 1b 1a Goal Stitch call tree from local logs Node Log View node-1: SR, 1, 1a, n/a CS, 1, 1b, 1a CR, 1, 1b, 1a node-2: SR, 1, 1b, 1a CS, 1, 1c, 1b CR, 1, 1c, 1b SS, 1, 1b, 1a node-3: SR, 1, 1c, 1b SS, 1, 1c, 1b CR, 1, 1b, 1a node-1 node-2 node-3 CS, 1, 1c, 1b SR, 1, 1c, 1b SS, 1, 1c, 1b CR, 1, 1c, 1b
  46. 46. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ 46 Contacts Presence Session SR, 1, 1a, n/a CS, 1, 1b, 1a SR, 1, 1b, 1a SS, 1, 1b 1a Goal Stitch call tree from local logs Node Log View node-1: SR, 1, 1a, n/a CS, 1, 1b, 1a CR, 1, 1b, 1a SS, 1, 1a, n/a node-2: SR, 1, 1b, 1a CS, 1, 1c, 1b CR, 1, 1c, 1b SS, 1, 1b, 1a node-3: SR, 1, 1c, 1b SS, 1, 1c, 1bCR, 1, 1b, 1a SS, 1, 1a, n/a node-1 node-2 node-3 CS, 1, 1c, 1b SR, 1, 1c, 1b SS, 1, 1c, 1b CR, 1, 1c, 1b
  47. 47. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ 47 Contacts Presence Session SR, 1, 1a, n/a CS, 1, 1b, 1a SR, 1, 1b, 1a SS, 1, 1b 1a Goal Stitch call tree from local logs Node Log View node-1: SR, 1, 1a, n/a CS, 1, 1b, 1a CR, 1, 1b, 1a CS, 1, 1c, 1a CR, 1, 1c, 1a SS, 1, 1a, n/a node-2: SR, 1, 1b, 1a CS, 1, 1c, 1b CR, 1, 1c, 1b SR, 1, 1c, 1a SS, 1, 1c, 1a SS, 1, 1b, 1a node-3: SR, 1, 1c, 1b SS, 1, 1c, 1b CR, 1, 1b, 1a SS, 1, 1a, n/a node-1 node-2 node-3 CS, 1, 1c, 1b SR, 1, 1c, 1b SS, 1, 1c, 1b CR, 1, 1c, 1b CS, 1, 1c, 1a SR, 1, 1c, 1a SS, 1, 1c, 1a CR, 1, 1c, 1a
  48. 48. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Review •  Annotation based instrumentation •  Pros o  End-to-end per request visibility o  Dependency of a single service on upstream and downstream services o  Critical path or long tail latency can be identified •  Cons o  Requires instrumentation of programs 48
  49. 49. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Trace collectors •  Out of band trace collection o  In band trace data adds application overhead o  Many systems continue munging data after results are returned o  Kafka, Flume, Scribe collectors are most popular o  Syslog-ng requires customization 49
  50. 50. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Trace collection overhead •  PLT test o  collect empirical data o  tune sampling o  second round sampling on collection •  Ability to switch off tracing o  No impact to in band services 50
  51. 51. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Flume 51 Contacts Presence Session Sink Collector Flume Sink Flume Sink Query Web Cassandra Developers
  52. 52. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Demo •  Generate Traces •  Aspects of Zipkin UX •  Aggregate Services Mapped 52
  53. 53. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Anatomy of a Trace •  Trace •  root span relating to many spans •  Span •  set of annotations •  Annotation •  records time of event + custom •  Sampling •  boolean 53
  54. 54. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Sampling •  Used to control the overhead •  Accomplish by using Trace Filters •  Fixed •  Adaptive •  Debug 54
  55. 55. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Annotation •  Timestamp events •  SR – server received, SS – server sent •  CS – client sent, CR – client received 55
  56. 56. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Annotation # some event took place, either one by the framework or by the user struct Annotation { 1: i64 timestamp # microseconds from epoch 2: string value # what happened at the timestamp? 3: optional Endpoint host # host this happened on 4: optional i32 duration # how long did the operation take? microseconds } 56
  57. 57. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Annotation.Endpoint # this represents a host and port in a network struct Endpoint { 1: i32 ipv4, 2: i16 port 3: string service_name # which service did this operation happen on? } 57
  58. 58. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Span •  TraceId •  name •  Id •  ParentSpanId •  Annotation List •  Binary Annotation List 58
  59. 59. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Span struct Span { 1: i64 trace_id # unique trace id, use for all spans in trace 3: string name, # span name, rpc method for example 4: i64 id, # unique span id, only used for this span 5: optional i64 parent_id, # parent span id 6: list<Annotation> annotations, # list of all annotations/events that occured 8: list<BinaryAnnotation> binary_annotations # any binary annotations } 59
  60. 60. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Communicate a Trace •  Protocol headers •  X-B3-TraceId •  X-B3-SpanId •  X-B3-ParentSpanId •  Sampling •  ~426 bytes 60
  61. 61. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Instrumention Points •  Each framework maybe different •  The ones implemented •  Spring •  Events, proxies and interceptors •  Netty •  Message headers, EventBus •  Loggers •  Logback, Log4j 61
  62. 62. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Spring Integration •  AbstractReplyProducingMessageHandler o  service-activator o  handleRequestMessage o  inInterceptor o  handleMessage o  outInterceptor 62
  63. 63. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Implementing a Library •  To record SR/SS timestamp annotations o  def preProcess (service: String, headers: Map, annot: Map) o  def postProcess() •  To record CS/CR timestamp annotations o  def preClientExec(service: String, headers: Map, annot: Map) o  def postClientExec() •  Record traces o  def trace(message: String) 63
  64. 64. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Initialize your System •  List<SpanCollector> spanCollector o  example: Log4j, Kafka, Scribe, Syslog •  List<TraceFilter> traceFilters o  example: FixedRate, Adaptive, Debug •  ServerTracer, ClientTracer o  spanCollector and traceFilter 64
  65. 65. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Pre-process API •  If part of a Trace o  set state from current trace headers o  generate a new span •  If not part of a Trace o  generate a unique traceId o  may set spanId = traceId •  set serverReceived (SR) 65
  66. 66. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Post-process API •  add annotations o  host, sessionid, jvm id, threadId •  set serverSend (SS) timestamp annotation •  Operations are symmetrical o  preClientExec o  postClientExec 66
  67. 67. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ TraceId generation /** * This salt is used to prevent traceId collisions between machines. * By giving each system a random salt, it is less likely that two * processes will sample the same subset of trace ids. */ private val salt = new Random().nextLong() •  Rejoinder o  ..when you have no colocated services 67
  68. 68. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ TraceId generation •  UUID based o  https://github.com/cogitate/twitter-zipkin-uuid o  Util o  Finagle o  Ostrich o  Server 68
  69. 69. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ TraceId generation •  UUID support o  SQL Lite o  Cassandra •  Circa April 2015 69
  70. 70. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Useful Zipkin APIs •  /traces/{id} •  /api/trace/{id} •  /api/query •  /api/services •  /api/spans 70
  71. 71. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Production coverage •  Common RPC libraries provide leverage •  Some coverage maybe lost due to o  non-instrumentable legacy systems o  non-standard control flow 71
  72. 72. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Non standard use cases •  Routers and Forwarders •  Netty like dispatchers •  Non-participating trace nodes o  To Trace or Not to Trace 72
  73. 73. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ 73 Contacts Router Service SR, 1, 1a, n/a CS, 1, 1b, 1a SR, 1, 1b, 1a Goal Stitch call tree from local logs Node Log View node-1: SR, 1, 1a, n/a CS, 1, 1b, 1a CR, 1, 1b, 1a SS, 1, 1a, n/a node-2: SR, 1, 1b, 1a CS, 1, 1c, 1b node-3: SR, 1, 1c, 1b SS, 1, 1c, 1b node-1 node-2 node-3 CS, 1, 1c, 1b SR, 1, 1c, 1b SS, 1, 1c, 1b CR, 1, 1b, 1a SS, 1, 1a, n/a
  74. 74. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ 74 Contacts Router Service SR, 1, 1a, n/a CS, 1, 1b, 1a SR, 1, 1b, 1a Goal Stitch call tree from local logs Node Log View node-1: SR, 1, 1a, n/a CS, 1, 1b, 1a CR, 1, 1b, 1a SS, 1, 1a, n/a node-2: SR, 1, 1b, 1a CS, 1, 1c, 1b node-3: SR, 1, 1c, 1b SS, 1, 1c, 1b Where’s my span? node-1 node-2 node-3 CS, 1, 1c, 1b SR, 1, 1c, 1b SS, 1, 1c, 1b CR, 1, 1b, 1a SS, 1, 1a, n/a
  75. 75. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Non standard use cases •  Reactive, Netty, Asynchronous callbacks o  Request sent on a different thread o  Response received on a different thread •  Messaging pattern o  getSpan (get context) o  setSpan (set context) 75
  76. 76. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Lessons Learnt •  Development teams use tracing system to debug o  Surprised with sampling •  Requirement to generate TraceId ALWAYS •  Multipage group tracing o  UberTrace – List<TraceId> •  Use collectors that you already have o  add converters •  Monitor o  Disk Space o  Collectors buffering 76
  77. 77. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Lessons Learnt •  Dependency Hell o  control libthrift version •  Mis-understand cassandra o  CQL o  Cassandra-CLI •  Need for different consumers o  Elastic Search o  Hadoop •  Aggregations o  Scalding, Algebird, Spark o  Steep learning curve 77
  78. 78. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Lessons Learnt •  Be ready to interact with INFRA o  Switch/Port, Dynamic Port Info o  SSD vs Traditional Hard disk o  Tools to probe system information •  Network failure points •  Capacity planning and sizing •  Understand Storage systems o  constraints and limitations 78
  79. 79. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Call Tracing •  Dedicated to a full view of a request’s path through SOA •  Based on Google’s Dapper paper •  Zipkin, Brave, Node-Tryfer •  Scala, Java, NodeJS •  Zipkin •  Collector, Query and UI 79
  80. 80. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Acknowledgements •  SpringOne2GX Team! •  Jeff Smick – Twitter •  Kristof Adriaenssens – Brave •  Node Tryfer - Rackspace •  Eirik Sletteberg - Aggregates 80
  81. 81. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Stay Connected 81 https://www.linkedin.com/in/cogitate
  82. 82. Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ Q&A 82

×