Zipkin - Strangeloop

8,789 views
8,402 views

Published on

Zipkin is a distributed tracing system that helps us gather timing data for all the disparate services at Twitter, and manages collection and lookup of data through a Collector and a Query service. With Zipkin, we can trace a subset of all requests made to the site, and collect detailed data about the path taken through our systems, as well as timings. Then, we can visualize and ultimately pinpoint where and possibly why a response took longer than expected.

Published in: Technology, Education
0 Comments
29 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
8,789
On SlideShare
0
From Embeds
0
Number of Embeds
68
Actions
Shares
0
Downloads
219
Comments
0
Likes
29
Embeds 0
No embeds

No notes for slide
  • \n
  • Before we get into what Zipkin is: why we created it\n
  • Shit is slow, you lose users and money\n
  • \n
  • Simplify wildly, there are two parts to web performance\nFront end. The order assets are loaded, minifying and other tricsk\nBack end. How quickly can we generate and push out the HTML/JSON/whatever\n
  • For the front end we have these nice development tools. Shows us how the assets loaded, what it was waiting for and so on.\nWe want a fancy tool like this for the backend.\nPicked a page where server side was unusually bad, normally it’s mostly FE\n
  • For the back end we only had these graphs to look at per service.\nNothing that ties them together\n
  • Capture information about how services in a datacentre is working together to respond to a request. \n
  • Read this paper from Google called Dapper\n
  • Read this paper from Google called Dapper\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • Mention that Cassandra and Scribe can both be replaced. Zookeeper coordination?\n
  • Mention that Cassandra and Scribe can both be replaced. Zookeeper coordination?\n
  • Mention that Cassandra and Scribe can both be replaced. Zookeeper coordination?\n
  • Mention that Cassandra and Scribe can both be replaced. Zookeeper coordination?\n
  • Mention that Cassandra and Scribe can both be replaced. Zookeeper coordination?\n
  • Mention that Cassandra and Scribe can both be replaced. Zookeeper coordination?\n
  • Mention that Cassandra and Scribe can both be replaced. Zookeeper coordination?\n
  • Mention that Cassandra and Scribe can both be replaced. Zookeeper coordination?\n
  • Mention that Cassandra and Scribe can both be replaced. Zookeeper coordination?\n
  • Mention that Cassandra and Scribe can both be replaced. Zookeeper coordination?\n
  • Mention that Cassandra and Scribe can both be replaced. Zookeeper coordination?\n
  • Mention that Cassandra and Scribe can both be replaced. Zookeeper coordination?\n
  • Mention that Cassandra and Scribe can both be replaced. Zookeeper coordination?\n
  • Mention that Cassandra and Scribe can both be replaced. Zookeeper coordination?\n
  • Mention that Cassandra and Scribe can both be replaced. Zookeeper coordination?\n
  • Mention that Cassandra and Scribe can both be replaced. Zookeeper coordination?\n
  • Mention that Cassandra and Scribe can both be replaced. Zookeeper coordination?\n
  • Mention that Cassandra and Scribe can both be replaced. Zookeeper coordination?\n
  • Mention that Cassandra and Scribe can both be replaced. Zookeeper coordination?\n
  • Mention that Cassandra and Scribe can both be replaced. Zookeeper coordination?\n
  • \n
  • Mention dependency on finagle-zipkin\n
  • Mention dependency on finagle-zipkin\n
  • Mention dependency on finagle-zipkin\n
  • Mention dependency on finagle-zipkin\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • Trace view\nServices on the left. Time scale on top. Least impactful parts of trace collapsed automatically.\nMention bootstrap and dj\n
  • \n
  • It’s all open source, check it out now.\n
  • Zipkin - Strangeloop

    1. 1. Z I P K I NA distributed tracing framework
    2. 2. Why Zipkin?
    3. 3. Google0.5 sec slower20% traffic drop @skr | @thisisfranklin 3
    4. 4. Google0.5 sec slower 1.5 sec faster20% traffic drop CTR up 12% @skr | @thisisfranklin 3
    5. 5. Performance matters
    6. 6. Front end @skr | @thisisfranklin 5
    7. 7. Front end Back end @skr | @thisisfranklin 5
    8. 8. @skr | @thisisfranklin 6
    9. 9. @skr | @thisisfranklin 6
    10. 10. @skr | @thisisfranklin 7
    11. 11. @skr | @thisisfranklin 8Picture from http://www.flickr.com/photos/jpellgen
    12. 12. @skr | @thisisfranklin 9
    13. 13. • Collects traces from production requests• Low overhead• Minimum of extra work for developers @skr | @thisisfranklin 9
    14. 14. Finagle“Finagle is an asynchronous network stack for the JVM that you can use to buildasynchronous Remote Procedure Call (RPC) clients and servers in Java, Scala, orany JVM-hosted language.” github.com/twitter/finagle
    15. 15. What to capture?
    16. 16. @skr | @thisisfranklin 12
    17. 17. Finagle Http service @skr | @thisisfranklin 12
    18. 18. Finagle Http service @skr | @thisisfranklin 12
    19. 19. Finagle Http serviceFinagle Thrift Service @skr | @thisisfranklin 12
    20. 20. Finagle Http serviceFinagle Thrift Service @skr | @thisisfranklin 12
    21. 21. Finagle Http serviceFinagle Thrift Service @skr | @thisisfranklin 12
    22. 22. Finagle Http serviceFinagle Thrift Service Service @skr | @thisisfranklin 12
    23. 23. Finagle Http serviceFinagle Thrift Service Service @skr | @thisisfranklin 12
    24. 24. Finagle Http serviceFinagle Thrift Service Service @skr | @thisisfranklin 12
    25. 25. Zipkin terminology‣ Annotation: string data associated with a particular timestamp, service, and host Time time: 2012-01-21 22:37:01 value: “something happened” server: 135.34.53.2 service: “timelineservice” @skr | @thisisfranklin 13
    26. 26. ‣ Span: represents one specific method call; made up of a set of annotations. Has a name and an id. Time T:0ms Client Send Span @skr | @thisisfranklin 14
    27. 27. ‣ Span: represents one specific method call; made up of a set of annotations. Has a name and an id. Time T:0ms Client Send Span T:10ms Server Receive @skr | @thisisfranklin 14
    28. 28. ‣ Span: represents one specific method call; made up of a set of annotations. Has a name and an id. Time T:0ms Client Send Span T:10ms Server Receive T:90ms Server Send @skr | @thisisfranklin 14
    29. 29. ‣ Span: represents one specific method call; made up of a set of annotations. Has a name and an id. Time T:0ms Client Send T:100ms Client Receive Span T:10ms Server Receive T:90ms Server Send @skr | @thisisfranklin 14
    30. 30. ‣ Span: represents one specific method call; made up of a set of annotations. Has a name and an id. Time T:0ms Client Send T:100ms Client Receive Span T:20ms Read 30 kbytes from file T:10ms Server Receive T:90ms Server Send @skr | @thisisfranklin 14
    31. 31. ‣ Span: represents one specific method call; made up of a set of annotations. Has a name and an id. Time T:0ms Client Send T:100ms Client Receive Span T:20ms Read 30 kbytes from file T:10ms Server Receive T:90ms Server Send‣ Trace: a set of spans all associated with the same request @skr | @thisisfranklin 14
    32. 32. Finagle httpservice @skr | @thisisfranklin 15
    33. 33. • Generate a random i64 trace id Finagle http service @skr | @thisisfranklin 15
    34. 34. • Generate a random i64 trace id• Decide if we should sample thetrace or not Finagle http service @skr | @thisisfranklin 15
    35. 35. • Generate a random i64 trace id• Decide if we should sample thetrace or not Finagle http service Finagle thrift service @skr | @thisisfranklin 15
    36. 36. • Generate a random i64 trace id• Decide if we should sample thetrace or not Finagle http service• Generate new span id Finagle thrift service @skr | @thisisfranklin 15
    37. 37. • Generate a random i64 trace id• Decide if we should sample thetrace or not Finagle http service• Generate new span id• Pass trace header Finagle thrift service @skr | @thisisfranklin 15
    38. 38. • Generate a random i64 trace id• Decide if we should sample thetrace or not Finagle http struct RequestHeader { service i64 trace_id, i64 span_id,• Generate new span id optional i64 parent_span_id,• Pass trace header optional bool sampled } Finagle thrift service @skr | @thisisfranklin 15
    39. 39. • Generate a random i64 trace id• Decide if we should sample thetrace or not Finagle http struct RequestHeader { service i64 trace_id, i64 span_id,• Generate new span id optional i64 parent_span_id,• Pass trace header optional bool sampled } Finagle• Thrift service adopts trace id from thriftheader if it exists service @skr | @thisisfranklin 15
    40. 40. Finagle httpserviceFinagle thriftservice @skr | @thisisfranklin 15
    41. 41. Finagle http serviceFinagle Finagle thrift thriftservice service @skr | @thisisfranklin 15
    42. 42. Finagle http serviceFinagle Finagle thrift thriftservice service Finagle thrift service @skr | @thisisfranklin 15
    43. 43. Finagle http service SFinagle Finagle thrift thriftSservice service S Finagle thrift S service @skr | @thisisfranklin 15
    44. 44. Finagle http service S Zipkin collectorFinagle Finagle thrift thriftSservice service S Cassandra Finagle thrift S service @skr | @thisisfranklin 15
    45. 45. Finagle http service S Zipkin collectorFinagle Finagle thrift thriftSservice service S Zipkin Zipkin Cassandra Query UI Finagle thrift S service @skr | @thisisfranklin 15
    46. 46. Finagle ♥ Zipkin
    47. 47. Finagle Thrift server ServerBuilder() .bindTo(address) .codec(ThriftServerFramedCodec()) .name("servicename") .build(someService) @skr | @thisisfranklin 17
    48. 48. Finagle Thrift server ServerBuilder() .bindTo(address) .codec(ThriftServerFramedCodec()) .name("servicename") .build(someService) @skr | @thisisfranklin 17
    49. 49. Finagle Thrift server ServerBuilder() .bindTo(address) .codec(ThriftServerFramedCodec()) .name("servicename") .build(someService) @skr | @thisisfranklin 17
    50. 50. Finagle Thrift server ServerBuilder() .bindTo(address) .codec(ThriftServerFramedCodec()) .name("servicename") .build(someService) @skr | @thisisfranklin 17
    51. 51. ServerBuilder() .bindTo(address) .codec(ThriftServerFramedCodec()) .name("servicename") .tracerFactory(ZipkinTracer()) .build(someService) @skr | @thisisfranklin 18
    52. 52. ClientBuilder() .cluster(hosts) .codec(ThriftClientFramedCodec()) .name("clientname") .tracerFactory(ZipkinTracer()) .build(someService) @skr | @thisisfranklin 19
    53. 53. Trace.record("doing stuff") @skr | @thisisfranklin 20
    54. 54. Trace.record("doing stuff") @skr | @thisisfranklin 20
    55. 55. Trace.record("doing stuff") time: 2012-01-21 22:37:01 value: “doing stuff” server: 135.34.53.2 service: “timelineservice” @skr | @thisisfranklin 20
    56. 56. Trace.recordBinary("key", data) @skr | @thisisfranklin 21
    57. 57. Key ValueTrace.recordBinary("key", data) responsecode 500 cache:somekey Hit sql.query select *... @skr | @thisisfranklin 21
    58. 58. Platform Protocol Client Server Finagle Thrift Yes Yes Finagle HTTP Yes Yes Finagle Memcache Yes No Finagle Redis Yes No Cassie Thrift Yes NoQuerulous JDBC Yes No Ruby Thrift Yes Yes @skr | @thisisfranklin 22
    59. 59. Zipkin UI
    60. 60. @skr | @thisisfranklin 24
    61. 61. @skr | @thisisfranklin 24
    62. 62. @skr | @thisisfranklin 24
    63. 63. What did we find?
    64. 64. github.com/twitter/zipkin @zipkinproject @skr | @thisisfranklin 27

    ×