Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
distributed  tracing
twitter zipkingoogle dapper    x-trace  tracelytics   ... more!
motivation
what is slow?
what is slow?
causal flow of control
causal flow of control
how to
possible approaches
possible approaches• Unique identifier
possible approaches• Unique identifier • propagate throughout
possible approaches• Unique identifier • propagate throughout • write instrumentation for various  transports
possible approaches• Unique identifier • propagate throughout • write instrumentation for various  transports
possible approaches• Unique identifier • propagate throughout • write instrumentation for various  transports• Observe and...
possible approaches• Unique identifier • propagate throughout • write instrumentation for various  transports• Observe and...
possible approaches• Unique identifier • propagate throughout • write instrumentation for various  transports• Observe and...
1BD57B58AE7E315BBEAB6795F0BDC198296357
t = start             nginxcache        python        db              internet             the java
t = start             nginxcache        python        db              internet             the java
t = start             nginxcache        python        db              internet             the java
t = start             nginxcache        python        db              internet             the java
t = start             nginxcache        python        db              internet             the java
t = start             nginxcache        python        db              internet             the java
t = start             nginxcache        python        db              internet             the java
t = start             nginxcache        python        db              internet             the java
t = start             nginxcache        python        db              internet             the java
t = start             nginxcache        python        db              internet             the java
t = start             nginxcache        python        db              internet             the java
t = start             nginxcache        python        db              internet             the java
t = start             nginxcache        python        db              internet             the java
t = start             nginxcache        python        db              internet             the java
t = start             nginxcache        python        db              internet             the java                       ...
piggyback rides• More Doable • HTTP: x-headers • Thrift: secret argument • Internal RPC protocol: you’re the  boss• Less D...
t = start             nginxcache        python        db              internet             the java                       ...
timing and structure• Timing  • distributed = clock skew• Structure -- two approaches  • Encode in ID  • Encode in back-po...
encode in ID?• nginx1• nginx1python1• nginx1python1cache1• nginx1python1cache1python2• nginx1python1cache1python2sql  1• n...
encode in back-pointer?   nginx   python   cache   python
reporting
reporting
other things worth   figuring out• sampling• reporting• aggregate analysis
thanks!tracelytics.com@dankosaurdan@tracelytics.com
resources• X-Trace: http://x-trace.net  • http://x-trace.net/pubs/xtr-nsdi07.pdf• Google Dapper: http://research.google.co...
distributed tracing in 5 minutes
distributed tracing in 5 minutes
distributed tracing in 5 minutes
distributed tracing in 5 minutes
distributed tracing in 5 minutes
distributed tracing in 5 minutes
distributed tracing in 5 minutes
distributed tracing in 5 minutes
distributed tracing in 5 minutes
Upcoming SlideShare
Loading in …5
×

distributed tracing in 5 minutes

7,974 views

Published on

lightning talk from surge 2012

Published in: Technology

distributed tracing in 5 minutes

  1. 1. distributed tracing
  2. 2. twitter zipkingoogle dapper x-trace tracelytics ... more!
  3. 3. motivation
  4. 4. what is slow?
  5. 5. what is slow?
  6. 6. causal flow of control
  7. 7. causal flow of control
  8. 8. how to
  9. 9. possible approaches
  10. 10. possible approaches• Unique identifier
  11. 11. possible approaches• Unique identifier • propagate throughout
  12. 12. possible approaches• Unique identifier • propagate throughout • write instrumentation for various transports
  13. 13. possible approaches• Unique identifier • propagate throughout • write instrumentation for various transports
  14. 14. possible approaches• Unique identifier • propagate throughout • write instrumentation for various transports• Observe and correlate
  15. 15. possible approaches• Unique identifier • propagate throughout • write instrumentation for various transports• Observe and correlate • always on the outside - black box
  16. 16. possible approaches• Unique identifier • propagate throughout • write instrumentation for various transports• Observe and correlate • always on the outside - black box • difficult to get threaded + evented processes right
  17. 17. 1BD57B58AE7E315BBEAB6795F0BDC198296357
  18. 18. t = start nginxcache python db internet the java
  19. 19. t = start nginxcache python db internet the java
  20. 20. t = start nginxcache python db internet the java
  21. 21. t = start nginxcache python db internet the java
  22. 22. t = start nginxcache python db internet the java
  23. 23. t = start nginxcache python db internet the java
  24. 24. t = start nginxcache python db internet the java
  25. 25. t = start nginxcache python db internet the java
  26. 26. t = start nginxcache python db internet the java
  27. 27. t = start nginxcache python db internet the java
  28. 28. t = start nginxcache python db internet the java
  29. 29. t = start nginxcache python db internet the java
  30. 30. t = start nginxcache python db internet the java
  31. 31. t = start nginxcache python db internet the java
  32. 32. t = start nginxcache python db internet the java t = end
  33. 33. piggyback rides• More Doable • HTTP: x-headers • Thrift: secret argument • Internal RPC protocol: you’re the boss• Less Doable • SQL: one way ticket, also you’re not percona • memcache: not extensible so not backwards compatible
  34. 34. t = start nginxcache python db internet the java t = end
  35. 35. timing and structure• Timing • distributed = clock skew• Structure -- two approaches • Encode in ID • Encode in back-pointers
  36. 36. encode in ID?• nginx1• nginx1python1• nginx1python1cache1• nginx1python1cache1python2• nginx1python1cache1python2sql 1• nginx1python1cache1python2sql 1python3• ...
  37. 37. encode in back-pointer? nginx python cache python
  38. 38. reporting
  39. 39. reporting
  40. 40. other things worth figuring out• sampling• reporting• aggregate analysis
  41. 41. thanks!tracelytics.com@dankosaurdan@tracelytics.com
  42. 42. resources• X-Trace: http://x-trace.net • http://x-trace.net/pubs/xtr-nsdi07.pdf• Google Dapper: http://research.google.com/ pubs/pub36356.html• Twitter Zipkin: https://github.com/twitter/zipkin• CMU PDL: www.pdl.cmu.edu • StarDust: http://www.pdl.cmu.edu/PDL-FTP/ SelfStar/thereska_sigmetrics06.pdf • Trace Diff: http://www.pdl.cmu.edu/PDL-FTP/ SelfStar/NSDI11.pdf

×