Z I P K I NA distributed tracing framework
Why Zipkin?
Google0.5 sec slower20% traffic drop                  @skr | @thisisfranklin   3
Google0.5 sec slower    1.5 sec faster20% traffic drop   CTR up 12%                               @skr | @thisisfranklin   3
Performance matters
Front end            @skr | @thisisfranklin   5
Front end   Back end                   @skr | @thisisfranklin   5
@skr | @thisisfranklin   6
@skr | @thisisfranklin   6
@skr | @thisisfranklin   7
@skr | @thisisfranklin   8Picture from http://www.flickr.com/photos/jpellgen
@skr | @thisisfranklin   9
• Collects traces from production requests• Low overhead• Minimum of extra work for developers                            ...
Finagle“Finagle is an asynchronous network stack for the JVM that you can use to buildasynchronous Remote Procedure Call (...
@skr | @thisisfranklin   11
Finagle Http service                       @skr | @thisisfranklin   11
Finagle Http service                       @skr | @thisisfranklin   11
Finagle Http serviceFinagle Thrift Service                                          @skr | @thisisfranklin   11
Finagle Http serviceFinagle Thrift Service                                          @skr | @thisisfranklin   11
Finagle Http serviceFinagle Thrift Service                                          @skr | @thisisfranklin   11
Finagle Http serviceFinagle Thrift Service                Service                                                @skr | @t...
Finagle Http serviceFinagle Thrift Service                Service                                                @skr | @t...
Finagle Http serviceFinagle Thrift Service                Service                                                @skr | @t...
Zipkin terminology‣   Annotation: string data associated with a particular timestamp, service, and    host                ...
‣   Span: represents one specific method call; made up of a set of annotations.    Has a name and an id.                  ...
‣   Span: represents one specific method call; made up of a set of annotations.    Has a name and an id.                  ...
‣   Span: represents one specific method call; made up of a set of annotations.    Has a name and an id.                  ...
‣   Span: represents one specific method call; made up of a set of annotations.    Has a name and an id.                  ...
‣   Span: represents one specific method call; made up of a set of annotations.    Has a name and an id.                  ...
‣   Span: represents one specific method call; made up of a set of annotations.    Has a name and an id.                  ...
Finagle httpservice          @skr | @thisisfranklin   14
• Generate a random i64 trace id                                   Finagle                                    http        ...
• Generate a random i64 trace id• Decide if we should sample thetrace or not                       Finagle                ...
• Generate a random i64 trace id• Decide if we should sample thetrace or not                       Finagle                ...
• Generate a random i64 trace id• Decide if we should sample thetrace or not                       Finagle                ...
• Generate a random i64 trace id• Decide if we should sample thetrace or not                       Finagle                ...
• Generate a random i64 trace id• Decide if we should sample thetrace or not                       Finagle                ...
• Generate a random i64 trace id• Decide if we should sample thetrace or not                            Finagle           ...
Finagle httpserviceFinagle thriftservice          @skr | @thisisfranklin   14
Finagle           http          serviceFinagle   Finagle thrift    thriftservice   service                    @skr | @this...
Finagle           http          serviceFinagle   Finagle thrift    thriftservice   service                    Finagle     ...
Finagle            http          service          SFinagle   Finagle thrift    thriftSservice   service          S        ...
Finagle            http          service          S                               Zipkin                              coll...
Finagle            http          service          S                               Zipkin                              coll...
Finagle ♥ Zipkin
Finagle Thrift server       ServerBuilder()        .bindTo(address)        .codec(ThriftServerFramedCodec())        .name(...
Finagle Thrift server       ServerBuilder()        .bindTo(address)        .codec(ThriftServerFramedCodec())        .name(...
Finagle Thrift server       ServerBuilder()        .bindTo(address)        .codec(ThriftServerFramedCodec())        .name(...
Finagle Thrift server       ServerBuilder()        .bindTo(address)        .codec(ThriftServerFramedCodec())        .name(...
Finagle Http server       ServerBuilder()        .bindTo(address)        .codec(Http())        .name("servicename")       ...
ServerBuilder() .bindTo(address) .codec(ThriftServerFramedCodec()) .name("servicename") .tracerFactory(ZipkinTracer()) .bu...
ClientBuilder() .cluster(hosts) .codec(ThriftServerFramedCodec()) .name("clientname") .tracerFactory(ZipkinTracer()) .buil...
Trace.record("doing stuff")                              @skr | @thisisfranklin   20
Trace.record("doing stuff")                              @skr | @thisisfranklin   20
Trace.record("doing stuff")   time: 2012-01-21 22:37:01   value: “doing stuff”   server: 135.34.53.2   service: “timelines...
Trace.recordBinary("key", data)                                  @skr | @thisisfranklin   21
Key           ValueTrace.recordBinary("key", data)   responsecode       500                                  cache:somekey...
Platform    Protocol   Client   Server Finagle     Thrift     Yes      Yes Finagle     HTTP       Yes      Yes Finagle    ...
Zipkin UI
@skr | @thisisfranklin   24
@skr | @thisisfranklin   24
@skr | @thisisfranklin   24
What did we find?
github.com/twitter/zipkin     @zipkinproject                            @skr | @thisisfranklin   27
Zipkin - Runtime open house
Zipkin - Runtime open house
Upcoming SlideShare
Loading in …5
×

Zipkin - Runtime open house

4,970 views

Published on

Zipkin is a distributed tracing system that helps us gather timing data for all the disparate services at Twitter, and manages collection and lookup of data through a Collector and a Query service.
With Zipkin, we can trace a subset of all requests made to the site, and collect detailed data about the path taken through our systems, as well as timings. Then, we can visualize and ultimately pinpoint where and possibly why a response took longer than expected.

Published in: Technology, Education
0 Comments
16 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
4,970
On SlideShare
0
From Embeds
0
Number of Embeds
134
Actions
Shares
0
Downloads
128
Comments
0
Likes
16
Embeds 0
No embeds

No notes for slide
  • Talk more about what tracing is. Compare in process profiling to distributed system?\n
  • Before we get into what Zipkin is: why we created it\n
  • Shit is slow, you lose users and money\n
  • \n
  • Simplify wildly, there are two parts to web performance\nFront end. The order assets are loaded, minifying and other tricsk\nBack end. How quickly can we generate and push out the HTML/JSON/whatever\n
  • For the front end we have these nice development tools. Shows us how the assets loaded, what it was waiting for and so on.\nWe want a fancy tool like this for the backend.\nPicked a page where server side was unusually bad, normally it’s mostly FE\n
  • For the back end we only had these graphs to look at per service.\nNothing that ties them together\n
  • Capture information about how services in a datacentre is working together to respond to a request. \n
  • Read this paper from Google called Dapper\n
  • Read this paper from Google called Dapper\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • Mention that Cassandra and Scribe can both be replaced. Zookeeper coordination?\n
  • Mention that Cassandra and Scribe can both be replaced. Zookeeper coordination?\n
  • Mention that Cassandra and Scribe can both be replaced. Zookeeper coordination?\n
  • Mention that Cassandra and Scribe can both be replaced. Zookeeper coordination?\n
  • Mention that Cassandra and Scribe can both be replaced. Zookeeper coordination?\n
  • Mention that Cassandra and Scribe can both be replaced. Zookeeper coordination?\n
  • Mention that Cassandra and Scribe can both be replaced. Zookeeper coordination?\n
  • Mention that Cassandra and Scribe can both be replaced. Zookeeper coordination?\n
  • Mention that Cassandra and Scribe can both be replaced. Zookeeper coordination?\n
  • Mention that Cassandra and Scribe can both be replaced. Zookeeper coordination?\n
  • Mention that Cassandra and Scribe can both be replaced. Zookeeper coordination?\n
  • Mention that Cassandra and Scribe can both be replaced. Zookeeper coordination?\n
  • Mention that Cassandra and Scribe can both be replaced. Zookeeper coordination?\n
  • Mention that Cassandra and Scribe can both be replaced. Zookeeper coordination?\n
  • Mention that Cassandra and Scribe can both be replaced. Zookeeper coordination?\n
  • Mention that Cassandra and Scribe can both be replaced. Zookeeper coordination?\n
  • Mention that Cassandra and Scribe can both be replaced. Zookeeper coordination?\n
  • Mention that Cassandra and Scribe can both be replaced. Zookeeper coordination?\n
  • Mention that Cassandra and Scribe can both be replaced. Zookeeper coordination?\n
  • Mention that Cassandra and Scribe can both be replaced. Zookeeper coordination?\n
  • \n
  • Mention dependency on finagle-zipkin\n
  • Mention dependency on finagle-zipkin\n
  • Mention dependency on finagle-zipkin\n
  • Mention dependency on finagle-zipkin\n
  • Mention dependency on finagle-zipkin\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • Trace view\nServices on the left. Time scale on top. Least impactful parts of trace collapsed automatically.\nMention bootstrap and dj\n
  • \n
  • It’s all open source, check it out now.\n
  • Zipkin - Runtime open house

    1. 1. Z I P K I NA distributed tracing framework
    2. 2. Why Zipkin?
    3. 3. Google0.5 sec slower20% traffic drop @skr | @thisisfranklin 3
    4. 4. Google0.5 sec slower 1.5 sec faster20% traffic drop CTR up 12% @skr | @thisisfranklin 3
    5. 5. Performance matters
    6. 6. Front end @skr | @thisisfranklin 5
    7. 7. Front end Back end @skr | @thisisfranklin 5
    8. 8. @skr | @thisisfranklin 6
    9. 9. @skr | @thisisfranklin 6
    10. 10. @skr | @thisisfranklin 7
    11. 11. @skr | @thisisfranklin 8Picture from http://www.flickr.com/photos/jpellgen
    12. 12. @skr | @thisisfranklin 9
    13. 13. • Collects traces from production requests• Low overhead• Minimum of extra work for developers @skr | @thisisfranklin 9
    14. 14. Finagle“Finagle is an asynchronous network stack for the JVM that you can use to buildasynchronous Remote Procedure Call (RPC) clients and servers in Java, Scala, orany JVM-hosted language.” github.com/twitter/finagle
    15. 15. @skr | @thisisfranklin 11
    16. 16. Finagle Http service @skr | @thisisfranklin 11
    17. 17. Finagle Http service @skr | @thisisfranklin 11
    18. 18. Finagle Http serviceFinagle Thrift Service @skr | @thisisfranklin 11
    19. 19. Finagle Http serviceFinagle Thrift Service @skr | @thisisfranklin 11
    20. 20. Finagle Http serviceFinagle Thrift Service @skr | @thisisfranklin 11
    21. 21. Finagle Http serviceFinagle Thrift Service Service @skr | @thisisfranklin 11
    22. 22. Finagle Http serviceFinagle Thrift Service Service @skr | @thisisfranklin 11
    23. 23. Finagle Http serviceFinagle Thrift Service Service @skr | @thisisfranklin 11
    24. 24. Zipkin terminology‣ Annotation: string data associated with a particular timestamp, service, and host Time time: 2012-01-21 22:37:01 value: “something happened” server: 135.34.53.2 service: “timelineservice” @skr | @thisisfranklin 12
    25. 25. ‣ Span: represents one specific method call; made up of a set of annotations. Has a name and an id. Time T:0ms Client Send Span @skr | @thisisfranklin 13
    26. 26. ‣ Span: represents one specific method call; made up of a set of annotations. Has a name and an id. Time T:0ms Client Send Span T:10ms Server Receive @skr | @thisisfranklin 13
    27. 27. ‣ Span: represents one specific method call; made up of a set of annotations. Has a name and an id. Time T:0ms Client Send Span T:10ms Server Receive T:90ms Server Send @skr | @thisisfranklin 13
    28. 28. ‣ Span: represents one specific method call; made up of a set of annotations. Has a name and an id. Time T:0ms Client Send T:100ms Client Receive Span T:10ms Server Receive T:90ms Server Send @skr | @thisisfranklin 13
    29. 29. ‣ Span: represents one specific method call; made up of a set of annotations. Has a name and an id. Time T:0ms Client Send T:100ms Client Receive Span T:20ms Read 30 kbytes from file T:10ms Server Receive T:90ms Server Send @skr | @thisisfranklin 13
    30. 30. ‣ Span: represents one specific method call; made up of a set of annotations. Has a name and an id. Time T:0ms Client Send T:100ms Client Receive Span T:20ms Read 30 kbytes from file T:10ms Server Receive T:90ms Server Send‣ Trace: a set of spans all associated with the same request @skr | @thisisfranklin 13
    31. 31. Finagle httpservice @skr | @thisisfranklin 14
    32. 32. • Generate a random i64 trace id Finagle http service @skr | @thisisfranklin 14
    33. 33. • Generate a random i64 trace id• Decide if we should sample thetrace or not Finagle http service @skr | @thisisfranklin 14
    34. 34. • Generate a random i64 trace id• Decide if we should sample thetrace or not Finagle http service Finagle thrift service @skr | @thisisfranklin 14
    35. 35. • Generate a random i64 trace id• Decide if we should sample thetrace or not Finagle http service• Generate new span id Finagle thrift service @skr | @thisisfranklin 14
    36. 36. • Generate a random i64 trace id• Decide if we should sample thetrace or not Finagle http service• Generate new span id• Pass trace header Finagle thrift service @skr | @thisisfranklin 14
    37. 37. • Generate a random i64 trace id• Decide if we should sample thetrace or not Finagle http struct RequestHeader { service i64 trace_id, i64 span_id,• Generate new span id optional i64 parent_span_id,• Pass trace header optional bool sampled } Finagle thrift service @skr | @thisisfranklin 14
    38. 38. • Generate a random i64 trace id• Decide if we should sample thetrace or not Finagle http struct RequestHeader { service i64 trace_id, i64 span_id,• Generate new span id optional i64 parent_span_id,• Pass trace header optional bool sampled } Finagle• Thrift service adopts trace id from thriftheader if it exists service @skr | @thisisfranklin 14
    39. 39. Finagle httpserviceFinagle thriftservice @skr | @thisisfranklin 14
    40. 40. Finagle http serviceFinagle Finagle thrift thriftservice service @skr | @thisisfranklin 14
    41. 41. Finagle http serviceFinagle Finagle thrift thriftservice service Finagle thrift service @skr | @thisisfranklin 14
    42. 42. Finagle http service SFinagle Finagle thrift thriftSservice service S Finagle thrift S service @skr | @thisisfranklin 14
    43. 43. Finagle http service S Zipkin collectorFinagle Finagle thrift thriftSservice service S Cassandra Finagle thrift S service @skr | @thisisfranklin 14
    44. 44. Finagle http service S Zipkin collectorFinagle Finagle thrift thriftSservice service S Zipkin Zipkin Cassandra Query UI Finagle thrift S service @skr | @thisisfranklin 14
    45. 45. Finagle ♥ Zipkin
    46. 46. Finagle Thrift server ServerBuilder() .bindTo(address) .codec(ThriftServerFramedCodec()) .name("servicename") .build(someService) @skr | @thisisfranklin 16
    47. 47. Finagle Thrift server ServerBuilder() .bindTo(address) .codec(ThriftServerFramedCodec()) .name("servicename") .build(someService) @skr | @thisisfranklin 16
    48. 48. Finagle Thrift server ServerBuilder() .bindTo(address) .codec(ThriftServerFramedCodec()) .name("servicename") .build(someService) @skr | @thisisfranklin 16
    49. 49. Finagle Thrift server ServerBuilder() .bindTo(address) .codec(ThriftServerFramedCodec()) .name("servicename") .build(someService) @skr | @thisisfranklin 16
    50. 50. Finagle Http server ServerBuilder() .bindTo(address) .codec(Http()) .name("servicename") .build(someService) @skr | @thisisfranklin 17
    51. 51. ServerBuilder() .bindTo(address) .codec(ThriftServerFramedCodec()) .name("servicename") .tracerFactory(ZipkinTracer()) .build(someService) @skr | @thisisfranklin 18
    52. 52. ClientBuilder() .cluster(hosts) .codec(ThriftServerFramedCodec()) .name("clientname") .tracerFactory(ZipkinTracer()) .build(someService) @skr | @thisisfranklin 19
    53. 53. Trace.record("doing stuff") @skr | @thisisfranklin 20
    54. 54. Trace.record("doing stuff") @skr | @thisisfranklin 20
    55. 55. Trace.record("doing stuff") time: 2012-01-21 22:37:01 value: “doing stuff” server: 135.34.53.2 service: “timelineservice” @skr | @thisisfranklin 20
    56. 56. Trace.recordBinary("key", data) @skr | @thisisfranklin 21
    57. 57. Key ValueTrace.recordBinary("key", data) responsecode 500 cache:somekey Hit sql.query select *... @skr | @thisisfranklin 21
    58. 58. Platform Protocol Client Server Finagle Thrift Yes Yes Finagle HTTP Yes Yes Finagle Memcache Yes No Finagle Redis Yes No Cassie Thrift Yes NoQuerulous JDBC Yes No Ruby Thrift Yes Yes @skr | @thisisfranklin 22
    59. 59. Zipkin UI
    60. 60. @skr | @thisisfranklin 24
    61. 61. @skr | @thisisfranklin 24
    62. 62. @skr | @thisisfranklin 24
    63. 63. What did we find?
    64. 64. github.com/twitter/zipkin @zipkinproject @skr | @thisisfranklin 27

    ×