Successfully reported this slideshow.
Your SlideShare is downloading. ×

Zipkin - Strangeloop

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Loading in …3
×

Check these out next

1 of 66 Ad

Zipkin - Strangeloop

Download to read offline

Zipkin is a distributed tracing system that helps us gather timing data for all the disparate services at Twitter, and manages collection and lookup of data through a Collector and a Query service. With Zipkin, we can trace a subset of all requests made to the site, and collect detailed data about the path taken through our systems, as well as timings. Then, we can visualize and ultimately pinpoint where and possibly why a response took longer than expected.

Zipkin is a distributed tracing system that helps us gather timing data for all the disparate services at Twitter, and manages collection and lookup of data through a Collector and a Query service. With Zipkin, we can trace a subset of all requests made to the site, and collect detailed data about the path taken through our systems, as well as timings. Then, we can visualize and ultimately pinpoint where and possibly why a response took longer than expected.

Advertisement
Advertisement

More Related Content

Viewers also liked (19)

Recently uploaded (20)

Advertisement

Zipkin - Strangeloop

  1. 1. Z I P K I N A distributed tracing framework
  2. 2. Why Zipkin?
  3. 3. Google 0.5 sec slower 20% traffic drop @skr | @thisisfranklin 3
  4. 4. Google 0.5 sec slower 1.5 sec faster 20% traffic drop CTR up 12% @skr | @thisisfranklin 3
  5. 5. Performance matters
  6. 6. Front end @skr | @thisisfranklin 5
  7. 7. Front end Back end @skr | @thisisfranklin 5
  8. 8. @skr | @thisisfranklin 6
  9. 9. @skr | @thisisfranklin 6
  10. 10. @skr | @thisisfranklin 7
  11. 11. @skr | @thisisfranklin 8 Picture from http://www.flickr.com/photos/jpellgen
  12. 12. @skr | @thisisfranklin 9
  13. 13. • Collects traces from production requests • Low overhead • Minimum of extra work for developers @skr | @thisisfranklin 9
  14. 14. Finagle “Finagle is an asynchronous network stack for the JVM that you can use to build asynchronous Remote Procedure Call (RPC) clients and servers in Java, Scala, or any JVM-hosted language.” github.com/twitter/finagle
  15. 15. What to capture?
  16. 16. @skr | @thisisfranklin 12
  17. 17. Finagle Http service @skr | @thisisfranklin 12
  18. 18. Finagle Http service @skr | @thisisfranklin 12
  19. 19. Finagle Http service Finagle Thrift Service @skr | @thisisfranklin 12
  20. 20. Finagle Http service Finagle Thrift Service @skr | @thisisfranklin 12
  21. 21. Finagle Http service Finagle Thrift Service @skr | @thisisfranklin 12
  22. 22. Finagle Http service Finagle Thrift Service Service @skr | @thisisfranklin 12
  23. 23. Finagle Http service Finagle Thrift Service Service @skr | @thisisfranklin 12
  24. 24. Finagle Http service Finagle Thrift Service Service @skr | @thisisfranklin 12
  25. 25. Zipkin terminology ‣ Annotation: string data associated with a particular timestamp, service, and host Time time: 2012-01-21 22:37:01 value: “something happened” server: 135.34.53.2 service: “timelineservice” @skr | @thisisfranklin 13
  26. 26. ‣ Span: represents one specific method call; made up of a set of annotations. Has a name and an id. Time T:0ms Client Send Span @skr | @thisisfranklin 14
  27. 27. ‣ Span: represents one specific method call; made up of a set of annotations. Has a name and an id. Time T:0ms Client Send Span T:10ms Server Receive @skr | @thisisfranklin 14
  28. 28. ‣ Span: represents one specific method call; made up of a set of annotations. Has a name and an id. Time T:0ms Client Send Span T:10ms Server Receive T:90ms Server Send @skr | @thisisfranklin 14
  29. 29. ‣ Span: represents one specific method call; made up of a set of annotations. Has a name and an id. Time T:0ms Client Send T:100ms Client Receive Span T:10ms Server Receive T:90ms Server Send @skr | @thisisfranklin 14
  30. 30. ‣ Span: represents one specific method call; made up of a set of annotations. Has a name and an id. Time T:0ms Client Send T:100ms Client Receive Span T:20ms Read 30 kbytes from file T:10ms Server Receive T:90ms Server Send @skr | @thisisfranklin 14
  31. 31. ‣ Span: represents one specific method call; made up of a set of annotations. Has a name and an id. Time T:0ms Client Send T:100ms Client Receive Span T:20ms Read 30 kbytes from file T:10ms Server Receive T:90ms Server Send ‣ Trace: a set of spans all associated with the same request @skr | @thisisfranklin 14
  32. 32. Finagle http service @skr | @thisisfranklin 15
  33. 33. • Generate a random i64 trace id Finagle http service @skr | @thisisfranklin 15
  34. 34. • Generate a random i64 trace id • Decide if we should sample the trace or not Finagle http service @skr | @thisisfranklin 15
  35. 35. • Generate a random i64 trace id • Decide if we should sample the trace or not Finagle http service Finagle thrift service @skr | @thisisfranklin 15
  36. 36. • Generate a random i64 trace id • Decide if we should sample the trace or not Finagle http service • Generate new span id Finagle thrift service @skr | @thisisfranklin 15
  37. 37. • Generate a random i64 trace id • Decide if we should sample the trace or not Finagle http service • Generate new span id • Pass trace header Finagle thrift service @skr | @thisisfranklin 15
  38. 38. • Generate a random i64 trace id • Decide if we should sample the trace or not Finagle http struct RequestHeader { service i64 trace_id, i64 span_id, • Generate new span id optional i64 parent_span_id, • Pass trace header optional bool sampled } Finagle thrift service @skr | @thisisfranklin 15
  39. 39. • Generate a random i64 trace id • Decide if we should sample the trace or not Finagle http struct RequestHeader { service i64 trace_id, i64 span_id, • Generate new span id optional i64 parent_span_id, • Pass trace header optional bool sampled } Finagle • Thrift service adopts trace id from thrift header if it exists service @skr | @thisisfranklin 15
  40. 40. Finagle http service Finagle thrift service @skr | @thisisfranklin 15
  41. 41. Finagle http service Finagle Finagle thrift thrift service service @skr | @thisisfranklin 15
  42. 42. Finagle http service Finagle Finagle thrift thrift service service Finagle thrift service @skr | @thisisfranklin 15
  43. 43. Finagle http service S Finagle Finagle thrift thrift S service service S Finagle thrift S service @skr | @thisisfranklin 15
  44. 44. Finagle http service S Zipkin collector Finagle Finagle thrift thrift S service service S Cassandra Finagle thrift S service @skr | @thisisfranklin 15
  45. 45. Finagle http service S Zipkin collector Finagle Finagle thrift thrift S service service S Zipkin Zipkin Cassandra Query UI Finagle thrift S service @skr | @thisisfranklin 15
  46. 46. Finagle ♥ Zipkin
  47. 47. Finagle Thrift server ServerBuilder() .bindTo(address) .codec(ThriftServerFramedCodec()) .name("servicename") .build(someService) @skr | @thisisfranklin 17
  48. 48. Finagle Thrift server ServerBuilder() .bindTo(address) .codec(ThriftServerFramedCodec()) .name("servicename") .build(someService) @skr | @thisisfranklin 17
  49. 49. Finagle Thrift server ServerBuilder() .bindTo(address) .codec(ThriftServerFramedCodec()) .name("servicename") .build(someService) @skr | @thisisfranklin 17
  50. 50. Finagle Thrift server ServerBuilder() .bindTo(address) .codec(ThriftServerFramedCodec()) .name("servicename") .build(someService) @skr | @thisisfranklin 17
  51. 51. ServerBuilder() .bindTo(address) .codec(ThriftServerFramedCodec()) .name("servicename") .tracerFactory(ZipkinTracer()) .build(someService) @skr | @thisisfranklin 18
  52. 52. ClientBuilder() .cluster(hosts) .codec(ThriftClientFramedCodec()) .name("clientname") .tracerFactory(ZipkinTracer()) .build(someService) @skr | @thisisfranklin 19
  53. 53. Trace.record("doing stuff") @skr | @thisisfranklin 20
  54. 54. Trace.record("doing stuff") @skr | @thisisfranklin 20
  55. 55. Trace.record("doing stuff") time: 2012-01-21 22:37:01 value: “doing stuff” server: 135.34.53.2 service: “timelineservice” @skr | @thisisfranklin 20
  56. 56. Trace.recordBinary("key", data) @skr | @thisisfranklin 21
  57. 57. Key Value Trace.recordBinary("key", data) responsecode 500 cache:somekey Hit sql.query select *... @skr | @thisisfranklin 21
  58. 58. Platform Protocol Client Server Finagle Thrift Yes Yes Finagle HTTP Yes Yes Finagle Memcache Yes No Finagle Redis Yes No Cassie Thrift Yes No Querulous JDBC Yes No Ruby Thrift Yes Yes @skr | @thisisfranklin 22
  59. 59. Zipkin UI
  60. 60. @skr | @thisisfranklin 24
  61. 61. @skr | @thisisfranklin 24
  62. 62. @skr | @thisisfranklin 24
  63. 63. What did we find?
  64. 64. github.com/twitter/zipkin @zipkinproject @skr | @thisisfranklin 27

Editor's Notes

  • \n
  • Before we get into what Zipkin is: why we created it\n
  • Shit is slow, you lose users and money\n
  • \n
  • Simplify wildly, there are two parts to web performance\nFront end. The order assets are loaded, minifying and other tricsk\nBack end. How quickly can we generate and push out the HTML/JSON/whatever\n
  • For the front end we have these nice development tools. Shows us how the assets loaded, what it was waiting for and so on.\nWe want a fancy tool like this for the backend.\nPicked a page where server side was unusually bad, normally it’s mostly FE\n
  • For the back end we only had these graphs to look at per service.\nNothing that ties them together\n
  • Capture information about how services in a datacentre is working together to respond to a request. \n
  • Read this paper from Google called Dapper\n
  • Read this paper from Google called Dapper\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • Mention that Cassandra and Scribe can both be replaced. Zookeeper coordination?\n
  • Mention that Cassandra and Scribe can both be replaced. Zookeeper coordination?\n
  • Mention that Cassandra and Scribe can both be replaced. Zookeeper coordination?\n
  • Mention that Cassandra and Scribe can both be replaced. Zookeeper coordination?\n
  • Mention that Cassandra and Scribe can both be replaced. Zookeeper coordination?\n
  • Mention that Cassandra and Scribe can both be replaced. Zookeeper coordination?\n
  • Mention that Cassandra and Scribe can both be replaced. Zookeeper coordination?\n
  • Mention that Cassandra and Scribe can both be replaced. Zookeeper coordination?\n
  • Mention that Cassandra and Scribe can both be replaced. Zookeeper coordination?\n
  • Mention that Cassandra and Scribe can both be replaced. Zookeeper coordination?\n
  • Mention that Cassandra and Scribe can both be replaced. Zookeeper coordination?\n
  • Mention that Cassandra and Scribe can both be replaced. Zookeeper coordination?\n
  • Mention that Cassandra and Scribe can both be replaced. Zookeeper coordination?\n
  • Mention that Cassandra and Scribe can both be replaced. Zookeeper coordination?\n
  • Mention that Cassandra and Scribe can both be replaced. Zookeeper coordination?\n
  • Mention that Cassandra and Scribe can both be replaced. Zookeeper coordination?\n
  • Mention that Cassandra and Scribe can both be replaced. Zookeeper coordination?\n
  • Mention that Cassandra and Scribe can both be replaced. Zookeeper coordination?\n
  • Mention that Cassandra and Scribe can both be replaced. Zookeeper coordination?\n
  • Mention that Cassandra and Scribe can both be replaced. Zookeeper coordination?\n
  • \n
  • Mention dependency on finagle-zipkin\n
  • Mention dependency on finagle-zipkin\n
  • Mention dependency on finagle-zipkin\n
  • Mention dependency on finagle-zipkin\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • Trace view\nServices on the left. Time scale on top. Least impactful parts of trace collapsed automatically.\nMention bootstrap and dj\n
  • \n
  • It’s all open source, check it out now.\n

×