Distributed tracing

•Download as PPT, PDF•

1 like•725 views

gives rise to a range of benefits including individual scaling and individual deployments. However, it also introduces challenges regarding configuration management, load balancing, and latency analysis. Reshmi Krishna discusses how companies like Twitter analyze microservices latency in real time and demonstrates how to integrate popular distributed tracing tools like Zipkin into an existing application with just a few lines of code. At the end, we will also see a demo of tracing capabilities from PCF Metrics.

Technology

Distributed Tracing
Latency analysis for microservices
Reshmi Krishna
@reshmi9k

About Me
 Software Engineer
 Senior Platform Architect, Pivotal
 Conference Speaker
MeetUp : Cloud-Native-New-York
@reshmi9k

Agenda
 Distributed Tracing
 Tracers and Tracing Systems
 Zipkin
 Demo – Spring Cloud Sleuth, Zipkin, PCF Metrics

It doesn’t look like this
QueryHandlerService
IndexerService
BackendService
PageRankingService
Web Frontend

Troubleshooting Latency issues
 When was the event? How long did it take?
 How do I know it was slow?
 Why did it take so long?
 Which microservice was responsible?

Distributed Tracing
 Distributed Tracing is a process of collecting end-to-end transaction graphs in near real
time
 A trace represents the entire journey of a request
 A span represents single operation call
 Distributed Tracing Systems are often used for this purpose. Zipkin is an example
 As a request is flowing from one microservice to another, tracers add logic to create
unique trace Id, span Id

Tracers
 Tracers add logic to create unique trace ID
 Trace ID is generated when the first request is made
 Span ID is generated as the request arrives at each microservice
 Example tracer is Spring Cloud Sleuth
 Tracers execute in your production apps! They are written to not log too much
 Tracers have instrumentation or sampling policy

Visualization - Traces & Spans
service1
Trace Id : 1, Span Id : 1
service4
Trace Id : 1, Parent Id : 2, Span Id : 4
service2
Trace Id : 1, Parent Id : 1, Span Id : 2
service3
Trace Id : 1, Parent Id : 2, Span Id : 3

Dapper Paper By Google
@reshmi9k
@reshmi9k
This paper described Dapper, which is Google’s production distributed systems
tracing infrastructure
Design Goals :
Low overhead
Application-level transparency
Scalability

Zipkin
Zipkin is a distributed tracing system
Implementation based on Dapper paper, Google
Aggregate spans into trace trees
Manages both collection and lookup of the data
In 2015, OpenZipkin became the primary fork

Demo : Architecture Diagram
Spring Cloud
Sleuth
Collector
Span
Store
Transport
Mq/Http/Log
Spring Cloud
Sleuth
Spring Cloud
Sleuth
Spring Cloud
Sleuth
Query
ServerZipkin UI
ZIPKIN
APP
APP
APP
APP

Links
 Dapper, Google : http://research.google.com/pubs/pub36356.html
 Code for this presentation : https://github.com/reshmik/DistributedTracingDemo_Velocity2016.git
 Sleuth’s documentation: http://cloud.spring.io/spring-cloud-sleuth/spring-cloud-sleuth.html
 Repo with Spring Boot Zipkin server: https://github.com/openzipkin/zipkin-java
 Zipkin deployed as an PCF https://github.com/spring-cloud-samples/sleuth-documentation-
apps/tree/master/zipkin-server
 Pivotal Web Services trial : https://run.pivotal.io/
 PivotalCloudFoundry on your laptop : https://docs.pivotal.io/pcf-dev/
@reshmi9k

What's hot

Azure Academyadi: Introduction to GitHub and AzureDevOpsLorenzo Barbieri

Connect Ops and Security with Flexible Web App and API ProtectionDevOps.com

What makes me to migrate entire VPC JAWS PANKRATION 2021Naomi Yamasaki

Keptn - Automated Operations & Continuous Delivery for k8sAndreas Grabner

Kubernetes vs App ServiceLorenzo Barbieri

Observability, Distributed Tracing, and Open Source: The Missing PrimerVMware Tanzu

IoT in the Cloud: Build and Unleash the Value in your Renewable Energy SystemMark Heckler

Building Cloud-agnostic Serverless APIsPostman

10 Steps to Cloud HappinessAll Things Open

Metrics Driven DevOps - Automate Scalability and Performance Into your PipelineAndreas Grabner

Real-time Analysis of Data Processing Pipelines with Spring Cloud Data Flow a...VMware Tanzu

Boston DevOps Days 2016: Implementing Metrics Driven DevOps - Why and HowAndreas Grabner

Release Readiness Validation with Keptn for Austrian Online Banking SoftwareAndreas Grabner

Sviluppare applicazioni nell'era dei "Big Data" con Scala e Spark - Mario Car...Codemotion

Enforce compliance policy with model-driven automationPuppet

A DevOps State of Mind with Microservices, Containers and KubernetesAll Things Open

Speed-Up Kafka Delivery with AsyncAPI & Microcks | Hugo Guerrero, Red HatHostedbyConfluent

Building serverless applications with Apache OpenWhisk and IBM Cloud FunctionsDaniel Krook

Lessons learned making Confluent Cloud | Addison Huddy and Dan Rosanova, Conf...HostedbyConfluent

Netflix - 40 msecIbrahim Kasim

What's hot (20)

Azure Academyadi: Introduction to GitHub and AzureDevOps

Connect Ops and Security with Flexible Web App and API Protection

What makes me to migrate entire VPC JAWS PANKRATION 2021

Keptn - Automated Operations & Continuous Delivery for k8s

Kubernetes vs App Service

Observability, Distributed Tracing, and Open Source: The Missing Primer

IoT in the Cloud: Build and Unleash the Value in your Renewable Energy System

Building Cloud-agnostic Serverless APIs

10 Steps to Cloud Happiness

Metrics Driven DevOps - Automate Scalability and Performance Into your Pipeline

Real-time Analysis of Data Processing Pipelines with Spring Cloud Data Flow a...

Boston DevOps Days 2016: Implementing Metrics Driven DevOps - Why and How

Release Readiness Validation with Keptn for Austrian Online Banking Software

Sviluppare applicazioni nell'era dei "Big Data" con Scala e Spark - Mario Car...

Enforce compliance policy with model-driven automation

A DevOps State of Mind with Microservices, Containers and Kubernetes

Speed-Up Kafka Delivery with AsyncAPI & Microcks | Hugo Guerrero, Red Hat

Building serverless applications with Apache OpenWhisk and IBM Cloud Functions

Lessons learned making Confluent Cloud | Addison Huddy and Dan Rosanova, Conf...

Netflix - 40 msec

Similar to Distributed tracing

PinTrace Advanced AWS meetup Suman Karumuri

Security Delivery Platform: Best practicesMihajlo Prerad

Netflix Cloud Architecture and Open Sourceaspyker

"Introducing Distributed Tracing in a Large Software System", Kostiantyn Sha...Fwdays

Opentracing jaegerOracle Korea

Distributed Tracing with JaegerInho Kang

Getting started with apache flink streaming apiPreetdeep Kumar

SpringOne 2016 in a nutshellJeroen Resoort

Opentracing 101HungWei Chiu

Monitoring in 2017 - TIAD Camp DockerThe Incredible Automation Day

Intro to dev ops and cloud serviceshardwyrd

Project Flogo: Serverless Integration, Powered by Flogo and LambdaLeon Stigter

Distributed tracingnishantmodak

Sukumar Nayak-Agile-DevOps-Cloud ManagementSukumar Nayak

Evolution of Monitoring and Prometheus (Dublin 2018)Brian Brazil

All the Ops: DataOps with GitOps for Streaming data on Kafka and KubernetesDevOps.com

Microservices Tracing With Spring Cloud and Zipkin @Szczecin JUGMarcin Grzejszczak

apidays LIVE Australia 2021 - Tracing across your distributed process boundar...apidays

Pivoting Spring XD to Spring Cloud Data Flow with Sabby AnandanPivotalOpenSourceHub

You Can't Protect What you Can't See. AWS Security Best Practices - Session S...Amazon Web Services

Similar to Distributed tracing (20)

PinTrace Advanced AWS meetup

Security Delivery Platform: Best practices

Netflix Cloud Architecture and Open Source

"Introducing Distributed Tracing in a Large Software System", Kostiantyn Sha...

Opentracing jaeger

Distributed Tracing with Jaeger

Getting started with apache flink streaming api

SpringOne 2016 in a nutshell

Opentracing 101

Monitoring in 2017 - TIAD Camp Docker

Intro to dev ops and cloud services

Project Flogo: Serverless Integration, Powered by Flogo and Lambda

Distributed tracing

Sukumar Nayak-Agile-DevOps-Cloud Management

Evolution of Monitoring and Prometheus (Dublin 2018)

All the Ops: DataOps with GitOps for Streaming data on Kafka and Kubernetes

Microservices Tracing With Spring Cloud and Zipkin @Szczecin JUG

apidays LIVE Australia 2021 - Tracing across your distributed process boundar...

Pivoting Spring XD to Spring Cloud Data Flow with Sabby Anandan

You Can't Protect What you Can't See. AWS Security Best Practices - Session S...

Recently uploaded

Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal

08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls

Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun

From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software

Finology Group – Insurtech Innovation Award 2024The Digital Insurer

04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG

Automating Google Workspace (GWS) & more with Apps Scriptwesley chun

Presentation on how to chat with PDF using ChatGPT code interpreternaman860154

Scaling API-first – The story of a global engineering organizationRadu Cotescu

The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j

CNv6 Instructor Chapter 6 Quality of Servicegiselly40

A Call to Action for Generative AI in 2024Results

08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls

Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo

🐬 The future of MySQL is Postgres 🐘RTylerCroy

TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc

The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los

Recently uploaded (20)

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf

08448380779 Call Girls In Civil Lines Women Seeking Men

Powerful Google developer tools for immediate impact! (2023-24 C)

From Event to Action: Accelerate Your Decision Making with Real-Time Automation

Finology Group – Insurtech Innovation Award 2024

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx

Automating Google Workspace (GWS) & more with Apps Script

Presentation on how to chat with PDF using ChatGPT code interpreter

Scaling API-first – The story of a global engineering organization

The Codex of Business Writing Software for Real-World Solutions 2.pptx

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...

CNv6 Instructor Chapter 6 Quality of Service

A Call to Action for Generative AI in 2024

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men

Factors to Consider When Choosing Accounts Payable Services Providers.pptx

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...

🐬 The future of MySQL is Postgres 🐘

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments

The 7 Things I Know About Cyber Security After 25 Years | April 2024

Distributed tracing

1. Distributed Tracing Latency analysis for microservices Reshmi Krishna @reshmi9k

2. About Me  Software Engineer  Senior Platform Architect, Pivotal  Conference Speaker MeetUp : Cloud-Native-New-York @reshmi9k

4. Agenda  Distributed Tracing  Tracers and Tracing Systems  Zipkin  Demo – Spring Cloud Sleuth, Zipkin, PCF Metrics

6. Everything is going to be okay!

7. Until

9. Let’s Debug

10. It doesn’t look like this QueryHandlerService IndexerService BackendService PageRankingService Web Frontend

11. More like this

12.

13. Troubleshooting Latency issues  When was the event? How long did it take?  How do I know it was slow?  Why did it take so long?  Which microservice was responsible?

14. Distributed Tracing  Distributed Tracing is a process of collecting end-to-end transaction graphs in near real time  A trace represents the entire journey of a request  A span represents single operation call  Distributed Tracing Systems are often used for this purpose. Zipkin is an example  As a request is flowing from one microservice to another, tracers add logic to create unique trace Id, span Id

15. Tracers  Tracers add logic to create unique trace ID  Trace ID is generated when the first request is made  Span ID is generated as the request arrives at each microservice  Example tracer is Spring Cloud Sleuth  Tracers execute in your production apps! They are written to not log too much  Tracers have instrumentation or sampling policy

16. Visualization - Traces & Spans service1 Trace Id : 1, Span Id : 1 service4 Trace Id : 1, Parent Id : 2, Span Id : 4 service2 Trace Id : 1, Parent Id : 1, Span Id : 2 service3 Trace Id : 1, Parent Id : 2, Span Id : 3

17. Dapper Paper By Google @reshmi9k @reshmi9k This paper described Dapper, which is Google’s production distributed systems tracing infrastructure Design Goals : Low overhead Application-level transparency Scalability

18. Zipkin Zipkin is a distributed tracing system Implementation based on Dapper paper, Google Aggregate spans into trace trees Manages both collection and lookup of the data In 2015, OpenZipkin became the primary fork

19. Initial Zipkin Architecture

20. Demo : Architecture Diagram Spring Cloud Sleuth Collector Span Store Transport Mq/Http/Log Spring Cloud Sleuth Spring Cloud Sleuth Spring Cloud Sleuth Query ServerZipkin UI ZIPKIN APP APP APP APP

21. Let’s look at some code & Demo

22. Links  Dapper, Google : http://research.google.com/pubs/pub36356.html  Code for this presentation : https://github.com/reshmik/DistributedTracingDemo_Velocity2016.git  Sleuth’s documentation: http://cloud.spring.io/spring-cloud-sleuth/spring-cloud-sleuth.html  Repo with Spring Boot Zipkin server: https://github.com/openzipkin/zipkin-java  Zipkin deployed as an PCF https://github.com/spring-cloud-samples/sleuth-documentation- apps/tree/master/zipkin-server  Pivotal Web Services trial : https://run.pivotal.io/  PivotalCloudFoundry on your laptop : https://docs.pivotal.io/pcf-dev/ @reshmi9k

Editor's Notes

A monolith usually looks like a big ball of mud with entangled dependencies, lack of cohesion, direct DB queries instead of using interfaces and APIs. It does NOT do one thing very well. It usually does a lot of things, which become brittle and difficult to reason on. All functionality must be deployed together No Language and framework heterogeneity More likely a failure will cascade resulting in a reliance reduction - brittle - high risk deployment Scale vertically or limited horizontal scaling of everything at once Large team - anti agile Harder to reuse Harder to modify - thousands of lines of hard to understand code Harder to replace - meantime to recovery is limited Getting up to speed Wikipedia: A big ball of mud is a software system that lacks a perceivable architecture. Although undesirable from a software engineering point of view, such systems are common in practice due to business pressures, developer turnover and code entropy. They are a type of design anti-pattern.
Death Star architecture by Adrian Cockcroft As visualized by App Dynamics, Boundary.com and Twitter internal tools
A trace represents the entire journey of a request A span is a basic unit of work Span id is identified by an unique 64-bit id Trace id is identified by a 64-bit id, which the span is part of A span contains timestamped records, any RPC timing data, and zero or more application-specific annotations The trace give u the structure through which you can identify your calls. You can you can think about trace as a tree and the tree nodes as spans. The edges indicate a casual relationship between a span and its parent span. Independent of its place in a larger trace tree, though, a span is also a simple log of timestamped records which encode the span’s start and end time, any RPC timing data, and zero or more application-specific annotations
Dapper was published in 2010 http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/36356.pdf
Zipkin is a distributed tracing system. It helps gather timing data needed to troubleshoot latency problems in microservice architectures. It manages both the collection and lookup of this data. Zipkin’s design is based on the Google Dapper paper. Started as a project in first hack week. Initial version of Dapper paper was implemented for Thrift Today it has grown to include support for tracing Http, Thrift, Memcache, SQL and Redis requests. The Apache Thrift software framework, for scalable cross-language services development, combines a software stack with a code generation engine to build services that work efficiently and seamlessly between C++, Java, Python, PHP, Ruby, Erlang, Perl, Haskell, C#, Cocoa, JavaScript, Node.js, Smalltalk, OCaml and Delphi and other languages.
Tracers collect timing data and transport it over HTTP or Kafka. We use Scribe to transport all the traces from the different services to Zipkin and Hadoop. Scribe was developed by Facebook and it’s made up of a daemon that can run on each server in your system. It listens for log messages and routes them to the correct receiver depending on the category. Once the trace data arrives at the Zipkin collector daemon we check that it’s valid, store it and the index it for lookups. Zipkin was originally built with Cassandra for storage. It was scalable, had a flexible schema, and is heavily used within Twitter. However, this component is now pluggable, and now we have support for Redis, HBase, MySQL, PostgreSQL, SQLite, and H2. Users query for traces via Zipkin’s Web UI or Api.
Tracers add logic to create unique trace ID Trace ID is generated when the first request is made Span Id is generated as the request arrives at each microservice Example tracer is Spring Cloud Sleuth Tracers execute in your production apps! They are written to not log too much Tracers have instrumentation or sampling policy to manage volumes of traces and spans

Distributed tracing

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Distributed tracing

Similar to Distributed tracing (20)

Recently uploaded

Recently uploaded (20)

Distributed tracing

Editor's Notes