SlideShare a Scribd company logo
1 of 70
Download to read offline
©2008–22 New Relic, Inc. All rights reserved
Daniel Kim, Lead Developer Relations Engineer @ New Relic
Kafka
Tracing
with
OpenTelemetry
©2008–22 New Relic, Inc. All rights reserved
Daniel Kim
Lead Developer Advocate, New Relic
- Likes spicy food (hotpot)
- Likes spicy talks
- Likes spicy tweets
- Follow me on @learnwdaniel
©2008–22 New Relic, Inc. All rights reserved
Agenda
- Why trace Kafka
- What is Distributed Tracing?
- What is OpenTelemetry?
- Implementing OpenTelemetry with Kafka
- Live Demo
©2008–22 New Relic, Inc. All rights reserved
I decided to do a live demo
©2008–22 New Relic, Inc. All rights reserved
Live Demo
Kafka Producer
Kafka Consumer
©2008–22 New Relic, Inc. All rights reserved
We need test data - so please ask questions!
ask.danielkim.fun
©2008–22 New Relic, Inc. All rights reserved
Trace
Why
Kafka?
ask.danielkim.fun
©2008–22 New Relic, Inc. All rights reserved
©2008–22 New Relic, Inc. All rights reserved
NRDB
200 petabytes a month
= 9,000,000 records/sec
ask.danielkim.fun
©2008–22 New Relic, Inc. All rights reserved
HTTP Requests
NRDB
(Durable Storage)
ask.danielkim.fun
©2008–22 New Relic, Inc. All rights reserved
Consumer
Microservice
Event 1
Event 2
Event 3 Consumer
Microservice
Consumer
Microservice
Partition
Topic
Broker
ask.danielkim.fun
©2008–22 New Relic, Inc. All rights reserved
Aggregate
Normalize
Decorate
Transform
NRDB
ask.danielkim.fun
©2008–22 New Relic, Inc. All rights reserved
Aggregate
Normalize
Decorate
Transform
NRDB
ask.danielkim.fun
©2008–22 New Relic, Inc. All rights reserved
Kafka retains messages for a
set amount of time so
consumers can potentially go
down or lag behind incoming
data and recover without
data loss
ask.danielkim.fun
©2008–22 New Relic, Inc. All rights reserved
Why do you need to trace Kafka?
ask.danielkim.fun
©2008–22 New Relic, Inc. All rights reserved
Agent
How do we optimize “time to glass”?
ask.danielkim.fun
©2008–22 New Relic, Inc. All rights reserved
Service
Service
Service
Service NRDB
1. Debug Bottlenecks in the ingest pipeline
©2008–22 New Relic, Inc. All rights reserved
Service
Service
[Canary]
Service
Service
2. Get visibility into Downstream impacts of your service
©2008–22 New Relic, Inc. All rights reserved
Service
Service]
Service
Service
3. Measure Data Loss
©2008–22 New Relic, Inc. All rights reserved
Service A
4. Fine Tune Kafka Config (Handling BackPressure)
Consumer
Producer
©2008–22 New Relic, Inc. All rights reserved
Service A
Consumer
Producer
4. Fine Tune Kafka Config (Handling BackPressure)
©2008–22 New Relic, Inc. All rights reserved
Producer
Batch 1
5. Fine Tune Kafka Config (Batching/Compressing Messages)
Message 1
Message 2
Message 3
Message 4
Message 5
Message 6
📦 batch.size = 6
Compression
©2008–22 New Relic, Inc. All rights reserved
Producer
Batch 2
5. Fine Tune Kafka Config (Batching/Compressing Messages)
Message 1
Message 2
Message 3
📦 batch.size = 6
⌛linger.ms = 20
📦 batch.size = 6
Compression
©2008–22 New Relic, Inc. All rights reserved
is
What
Distributed Tracing?
ask.danielkim.fun
©2008–22 New Relic, Inc. All rights reserved
What is Distributed Tracing?
User
My Awesome
API
ask.danielkim.fun
©2008–22 New Relic, Inc. All rights reserved
What is Distributed Tracing?
User
My Awesome
API
…. 13 SECONDS LATER
ask.danielkim.fun
©2008–22 New Relic, Inc. All rights reserved
What is Distributed Tracing?
User
My Awesome API
…. 13 SECONDS LATER
A B
D C
ask.danielkim.fun
©2008–22 New Relic, Inc. All rights reserved
How do we connect the dots?
Context
ask.danielkim.fun
©2008–22 New Relic, Inc. All rights reserved
Context in action
My Awesome API
A
B
We only see one side of the
transaction.
ask.danielkim.fun
©2008–22 New Relic, Inc. All rights reserved
Context in action
My Awesome API
A .Here A is sending a payload to
somewhere.
ask.danielkim.fun
©2008–22 New Relic, Inc. All rights reserved
Context in action
My Awesome API
B
Here B is receiving a payload
from somewhere
ask.danielkim.fun
©2008–22 New Relic, Inc. All rights reserved
Context in action
My Awesome API
A
B
Each Individual Span gets
its own id
ask.danielkim.fun
©2008–22 New Relic, Inc. All rights reserved
Context in action
My Awesome API
A
B
Each Individual Span gets
its own id (transaction)
And the whole trace gets its
own id (parentid)
ask.danielkim.fun
©2008–22 New Relic, Inc. All rights reserved
Context in action
My Awesome API
A
B
Each Individual Span gets
its own id (transaction)
And the whole trace gets its
own id (parentid)
Both data points get
encoded as a set of HTTP
headers
ask.danielkim.fun
©2008–22 New Relic, Inc. All rights reserved
Context in action
My Awesome API
A
B
transaction=12H4
parentid=x1Fd
Context
x1Fd
Overall Trace 12H4
And the parent span is x1FD
h4a5
ask.danielkim.fun
©2008–22 New Relic, Inc. All rights reserved
Context in action
My Awesome API
A
B
transaction=12H4
parentid=x1Fd
Received by A
x1Fd
Overall Trace 12H4
And the parent span is x1FD
h4a5
x1Fd
h4a5
12H4
ask.danielkim.fun
©2008–22 New Relic, Inc. All rights reserved
A
B
What if we have microservices with
different languages or frameworks?
What if microservices are using
different observability systems?
What if we have decoupled
microservices via Kafka?
B
ask.danielkim.fun
©2008–22 New Relic, Inc. All rights reserved
W3C standard
ask.danielkim.fun
©2008–22 New Relic, Inc. All rights reserved
A C
X-NEWRELIC : mY1d3
B
ask.danielkim.fun
©2008–22 New Relic, Inc. All rights reserved
A C
TraceParent: …
TraceState: NR=...
TraceParent: …
TraceState: NR=...
B
ask.danielkim.fun
©2008–22 New Relic, Inc. All rights reserved
OSS Standard for Context Propagation
traceparent
tracestate
A
B
W3C Trace Context
enables cross-vendor
interoperation of traces
ask.danielkim.fun
©2008–22 New Relic, Inc. All rights reserved
Adoption of W3C Tracing standard
https://github.com/w3c/trace-context/blob/main/implementations.md
Vendors adopting W3C
OpenTelemetry
©2008–22 New Relic, Inc. All rights reserved
is
What
OpenTelemetry
ask.danielkim.fun
©2008–22 New Relic, Inc. All rights reserved
What is OpenTelemetry?
+ =
ask.danielkim.fun
©2008–22 New Relic, Inc. All rights reserved
What is OpenTelemetry?
My Awesome API
A B
D C
It’s an Open Source
Standard for
instrumentation
ask.danielkim.fun
©2008–22 New Relic, Inc. All rights reserved
What is OpenTelemetry?
My Awesome API
A B
D C
Vendor Neutral
Export data into multiple data
backends for processing
ask.danielkim.fun
©2008–22 New Relic, Inc. All rights reserved
Distributed Tracing
Implementing
For Kafka
ask.danielkim.fun
©2008–22 New Relic, Inc. All rights reserved
What happens with Kafka?
My Awesome API
A
B
Context is lost and
transaction is broken
ask.danielkim.fun
©2008–22 New Relic, Inc. All rights reserved
What happens with Kafka?
Consumer
Producer
Kafka decouples producers and consumers
ask.danielkim.fun
©2008–22 New Relic, Inc. All rights reserved
CONTEXT
Adding
to traces
ask.danielkim.fun
©2008–22 New Relic, Inc. All rights reserved
Otelsarama injects context into
Kafka messages, allowing you to
trace asynchronous messages
©2008–22 New Relic, Inc. All rights reserved
Kafka Headers
Message
HEADER
Add metadata about your record without
changing the content of your message
ask.danielkim.fun
©2008–22 New Relic, Inc. All rights reserved
Kafka Headers
Message
Let’s inject context (W3C Tracing Headers)
TraceState
TraceParent
ask.danielkim.fun
©2008–22 New Relic, Inc. All rights reserved
Kafka Headers
Producer
TraceState
Message
TraceParent
Producer needs to
Inject context into
Kafka Header
ask.danielkim.fun
©2008–22 New Relic, Inc. All rights reserved
Kafka Headers
TraceState
Message
TraceParent
ask.danielkim.fun
©2008–22 New Relic, Inc. All rights reserved
Kafka Headers
Consumer
TraceState
Message
TraceParent
ask.danielkim.fun
Context (Parent Span)
Timestamps
Attributes
…
©2008–22 New Relic, Inc. All rights reserved
Let’s print out what is injected in the header
kcat
https://github.com/edenhill/kcat
ask.danielkim.fun
©2008–22 New Relic, Inc. All rights reserved
Reading the trace headers
00-6626f84bd20a9a171a699a560387102b-10ba2f2a74c884bf-01
Tracing Standard Version
number (Base16)
ask.danielkim.fun
©2008–22 New Relic, Inc. All rights reserved
Reading the trace headers
00-6626f84bd20a9a171a699a560387102b-10ba2f2a74c884bf-01
Trace ID (Base16)
This is the ID of the whole trace and is used to uniquely identify
a distributed trace through a system.
ask.danielkim.fun
©2008–22 New Relic, Inc. All rights reserved
Reading the trace headers
00-6626f84bd20a9a171a699a560387102b-10ba2f2a74c884bf-01
Trace
Parent(Base16)
ask.danielkim.fun
©2008–22 New Relic, Inc. All rights reserved
Reading the trace headers
00-6626f84bd20a9a171a699a560387102b-10ba2f2a74c884bf-01
Tracing Flags
This flag indicates that the Span
has been sampled and will be
exported
ask.danielkim.fun
©2008–22 New Relic, Inc. All rights reserved
OTel in Action
Consumer
Producer
Traceid
Spanid
Timestamps
ask.danielkim.fun
©2008–22 New Relic, Inc. All rights reserved
Live Demo
github.com/lazyplatypus/kafka-otel
ask.danielkim.fun
©2008–22 New Relic, Inc. All rights reserved
©2008–22 New Relic, Inc. All rights reserved
What’s Happening?
message
Consumer
The consumer was busy
handling other requests so the
message was sitting in the kafka
queue
©2008–22 New Relic, Inc. All rights reserved
What’s Happening?
ConsumeClaim
time.Sleep
printMessage
MarkMessage
tweetMessage
©2008–22 New Relic, Inc. All rights reserved
WE
What have
Learned?
©2008–22 New Relic, Inc. All rights reserved
Caveat:
Implementing Distributed Tracing is NOT EASY
Especially at Scale, a lot of Cross Functional work to make sure
tracing is implemented in every microservice in your set up.
It is gonna be a lot of data, even when you sample every 10k, 50k,
100k traces.
©2008–22 New Relic, Inc. All rights reserved
Distributed Tracing is AWESOME!
- It gets you better visibility into the WHY on things are behaving
funky and helps you optimize your data ingest pipeline
- It helps you be a better neighbor
- Helps you tweak config variables to make your system more
performant.
- OSS Instrumentation/standards is the future of observability!
ask.danielkim.fun
©2008–22 New Relic, Inc. All rights reserved
Thank you!
twitter.com/learnwdaniel
ask.danielkim.fun

More Related Content

What's hot

What's hot (20)

Exploring the power of OpenTelemetry on Kubernetes
Exploring the power of OpenTelemetry on KubernetesExploring the power of OpenTelemetry on Kubernetes
Exploring the power of OpenTelemetry on Kubernetes
 
OpenTelemetry For Operators
OpenTelemetry For OperatorsOpenTelemetry For Operators
OpenTelemetry For Operators
 
Opentelemetry - From frontend to backend
Opentelemetry - From frontend to backendOpentelemetry - From frontend to backend
Opentelemetry - From frontend to backend
 
CI-CD Jenkins, GitHub Actions, Tekton
CI-CD Jenkins, GitHub Actions, Tekton CI-CD Jenkins, GitHub Actions, Tekton
CI-CD Jenkins, GitHub Actions, Tekton
 
OSMC 2022 | OpenTelemetry 101 by Dotan Horovit s.pdf
OSMC 2022 | OpenTelemetry 101 by Dotan Horovit s.pdfOSMC 2022 | OpenTelemetry 101 by Dotan Horovit s.pdf
OSMC 2022 | OpenTelemetry 101 by Dotan Horovit s.pdf
 
Improve monitoring and observability for kubernetes with oss tools
Improve monitoring and observability for kubernetes with oss toolsImprove monitoring and observability for kubernetes with oss tools
Improve monitoring and observability for kubernetes with oss tools
 
Elastic-Engineering
Elastic-EngineeringElastic-Engineering
Elastic-Engineering
 
OpenTelemetry Introduction
OpenTelemetry Introduction OpenTelemetry Introduction
OpenTelemetry Introduction
 
OpenTelemetry For Developers
OpenTelemetry For DevelopersOpenTelemetry For Developers
OpenTelemetry For Developers
 
Observability
ObservabilityObservability
Observability
 
OpenShift 4, the smarter Kubernetes platform
OpenShift 4, the smarter Kubernetes platformOpenShift 4, the smarter Kubernetes platform
OpenShift 4, the smarter Kubernetes platform
 
Open shift 4 infra deep dive
Open shift 4    infra deep diveOpen shift 4    infra deep dive
Open shift 4 infra deep dive
 
Service Mesh - Observability
Service Mesh - ObservabilityService Mesh - Observability
Service Mesh - Observability
 
OpenTelemetry: From front- to backend (2022)
OpenTelemetry: From front- to backend (2022)OpenTelemetry: From front- to backend (2022)
OpenTelemetry: From front- to backend (2022)
 
Shift left Observability
Shift left ObservabilityShift left Observability
Shift left Observability
 
Service mesh
Service meshService mesh
Service mesh
 
Microservices & API Gateways
Microservices & API Gateways Microservices & API Gateways
Microservices & API Gateways
 
Everything You wanted to Know About Distributed Tracing
Everything You wanted to Know About Distributed TracingEverything You wanted to Know About Distributed Tracing
Everything You wanted to Know About Distributed Tracing
 
Combining logs, metrics, and traces for unified observability
Combining logs, metrics, and traces for unified observabilityCombining logs, metrics, and traces for unified observability
Combining logs, metrics, and traces for unified observability
 
Apigee Demo: API Platform Overview
Apigee Demo: API Platform OverviewApigee Demo: API Platform Overview
Apigee Demo: API Platform Overview
 

Similar to Distributed Tracing for Kafka with OpenTelemetry with Daniel Kim | Kafka Summit London 2022

PostgreSQL major version upgrade using built in Logical Replication
PostgreSQL major version upgrade using built in Logical ReplicationPostgreSQL major version upgrade using built in Logical Replication
PostgreSQL major version upgrade using built in Logical Replication
Atsushi Torikoshi
 

Similar to Distributed Tracing for Kafka with OpenTelemetry with Daniel Kim | Kafka Summit London 2022 (20)

Building A System That Never Stops [FutureStack16 NYC]
Building  A System That Never Stops [FutureStack16 NYC]Building  A System That Never Stops [FutureStack16 NYC]
Building A System That Never Stops [FutureStack16 NYC]
 
Uncover the Root Cause of Kafka Performance Anomalies, Daniel Kim & Antón Rod...
Uncover the Root Cause of Kafka Performance Anomalies, Daniel Kim & Antón Rod...Uncover the Root Cause of Kafka Performance Anomalies, Daniel Kim & Antón Rod...
Uncover the Root Cause of Kafka Performance Anomalies, Daniel Kim & Antón Rod...
 
Best Practices for Measuring your Code Pipeline
Best Practices for Measuring your Code PipelineBest Practices for Measuring your Code Pipeline
Best Practices for Measuring your Code Pipeline
 
The Role of Standards in IoT Security
The Role of Standards in IoT SecurityThe Role of Standards in IoT Security
The Role of Standards in IoT Security
 
New Relic Infrastructure - New Integrations For Smarter and Faster Cloud Adop...
New Relic Infrastructure - New Integrations For Smarter and Faster Cloud Adop...New Relic Infrastructure - New Integrations For Smarter and Faster Cloud Adop...
New Relic Infrastructure - New Integrations For Smarter and Faster Cloud Adop...
 
PostgreSQL major version upgrade using built in Logical Replication
PostgreSQL major version upgrade using built in Logical ReplicationPostgreSQL major version upgrade using built in Logical Replication
PostgreSQL major version upgrade using built in Logical Replication
 
Charles sonigo - Demuxed 2018 - How to be data-driven when you aren't Netflix...
Charles sonigo - Demuxed 2018 - How to be data-driven when you aren't Netflix...Charles sonigo - Demuxed 2018 - How to be data-driven when you aren't Netflix...
Charles sonigo - Demuxed 2018 - How to be data-driven when you aren't Netflix...
 
Media processing with serverless architecture
Media processing with serverless architectureMedia processing with serverless architecture
Media processing with serverless architecture
 
RTBkit Meetup - Developer Spotlight, Behind the Scenes of RTBkit and Intro to...
RTBkit Meetup - Developer Spotlight, Behind the Scenes of RTBkit and Intro to...RTBkit Meetup - Developer Spotlight, Behind the Scenes of RTBkit and Intro to...
RTBkit Meetup - Developer Spotlight, Behind the Scenes of RTBkit and Intro to...
 
SRE-iously: Defining the Principles, Habits, and Practices of Site Reliabilit...
SRE-iously: Defining the Principles, Habits, and Practices of Site Reliabilit...SRE-iously: Defining the Principles, Habits, and Practices of Site Reliabilit...
SRE-iously: Defining the Principles, Habits, and Practices of Site Reliabilit...
 
Building a System That Never Stops New Relic at Scale
Building a System That Never Stops New Relic at ScaleBuilding a System That Never Stops New Relic at Scale
Building a System That Never Stops New Relic at Scale
 
From shipping rpms to helm charts - Lessons learned and best practices
From shipping rpms to helm charts - Lessons learned and best practicesFrom shipping rpms to helm charts - Lessons learned and best practices
From shipping rpms to helm charts - Lessons learned and best practices
 
Understanding Microservice Latency for DevOps Teams: An Introduction to New R...
Understanding Microservice Latency for DevOps Teams: An Introduction to New R...Understanding Microservice Latency for DevOps Teams: An Introduction to New R...
Understanding Microservice Latency for DevOps Teams: An Introduction to New R...
 
“Quantum” Performance Effects: beyond the Core
“Quantum” Performance Effects: beyond the Core“Quantum” Performance Effects: beyond the Core
“Quantum” Performance Effects: beyond the Core
 
From Copycat Codelets to an AI Market Internet Protocol
From Copycat Codelets to an AI Market Internet ProtocolFrom Copycat Codelets to an AI Market Internet Protocol
From Copycat Codelets to an AI Market Internet Protocol
 
Software Stacks to enable SDN and NFV
Software Stacks to enable SDN and NFVSoftware Stacks to enable SDN and NFV
Software Stacks to enable SDN and NFV
 
Overpowered Kubernetes: CI/CD for K8s on Enterprise IaaS
Overpowered Kubernetes: CI/CD for K8s on Enterprise IaaSOverpowered Kubernetes: CI/CD for K8s on Enterprise IaaS
Overpowered Kubernetes: CI/CD for K8s on Enterprise IaaS
 
The impact of IOT - exchange cala - 2015
The impact of IOT - exchange cala - 2015The impact of IOT - exchange cala - 2015
The impact of IOT - exchange cala - 2015
 
Monitoring the Dynamic Nature of Cloud Computing
Monitoring the Dynamic Nature of Cloud ComputingMonitoring the Dynamic Nature of Cloud Computing
Monitoring the Dynamic Nature of Cloud Computing
 
Hardening Your CI/CD Pipelines with GitOps and Continuous Security
Hardening Your CI/CD Pipelines with GitOps and Continuous SecurityHardening Your CI/CD Pipelines with GitOps and Continuous Security
Hardening Your CI/CD Pipelines with GitOps and Continuous Security
 

More from HostedbyConfluent

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
HostedbyConfluent
 
Evolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolEvolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at Trendyol
HostedbyConfluent
 

More from HostedbyConfluent (20)

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Renaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit LondonRenaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit London
 
Evolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolEvolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at Trendyol
 
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking TechniquesEnsuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
 
Exactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and KafkaExactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and Kafka
 
Fish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit LondonFish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit London
 
Tiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit LondonTiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit London
 
Building a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And WhyBuilding a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And Why
 
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
 
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
 
Navigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka ClustersNavigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka Clusters
 
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data PlatformApache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
 
Explaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy PubExplaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy Pub
 
TL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit LondonTL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit London
 
A Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSLA Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSL
 
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing PerformanceMastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
 
Data Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and BeyondData Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and Beyond
 
Code-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink AppsCode-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink Apps
 
Debezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC EcosystemDebezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC Ecosystem
 
Beyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local DisksBeyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local Disks
 

Recently uploaded

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 

Recently uploaded (20)

Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
ChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps Productivity
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Navigating Identity and Access Management in the Modern Enterprise
Navigating Identity and Access Management in the Modern EnterpriseNavigating Identity and Access Management in the Modern Enterprise
Navigating Identity and Access Management in the Modern Enterprise
 
Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptx
 
Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data PlatformLess Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
 
Choreo: Empowering the Future of Enterprise Software Engineering
Choreo: Empowering the Future of Enterprise Software EngineeringChoreo: Empowering the Future of Enterprise Software Engineering
Choreo: Empowering the Future of Enterprise Software Engineering
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Quantum Leap in Next-Generation Computing
Quantum Leap in Next-Generation ComputingQuantum Leap in Next-Generation Computing
Quantum Leap in Next-Generation Computing
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Design and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data ScienceDesign and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data Science
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
 

Distributed Tracing for Kafka with OpenTelemetry with Daniel Kim | Kafka Summit London 2022

  • 1. ©2008–22 New Relic, Inc. All rights reserved Daniel Kim, Lead Developer Relations Engineer @ New Relic Kafka Tracing with OpenTelemetry
  • 2. ©2008–22 New Relic, Inc. All rights reserved Daniel Kim Lead Developer Advocate, New Relic - Likes spicy food (hotpot) - Likes spicy talks - Likes spicy tweets - Follow me on @learnwdaniel
  • 3. ©2008–22 New Relic, Inc. All rights reserved Agenda - Why trace Kafka - What is Distributed Tracing? - What is OpenTelemetry? - Implementing OpenTelemetry with Kafka - Live Demo
  • 4. ©2008–22 New Relic, Inc. All rights reserved I decided to do a live demo
  • 5. ©2008–22 New Relic, Inc. All rights reserved Live Demo Kafka Producer Kafka Consumer
  • 6. ©2008–22 New Relic, Inc. All rights reserved We need test data - so please ask questions! ask.danielkim.fun
  • 7. ©2008–22 New Relic, Inc. All rights reserved Trace Why Kafka? ask.danielkim.fun
  • 8. ©2008–22 New Relic, Inc. All rights reserved
  • 9. ©2008–22 New Relic, Inc. All rights reserved NRDB 200 petabytes a month = 9,000,000 records/sec ask.danielkim.fun
  • 10. ©2008–22 New Relic, Inc. All rights reserved HTTP Requests NRDB (Durable Storage) ask.danielkim.fun
  • 11. ©2008–22 New Relic, Inc. All rights reserved Consumer Microservice Event 1 Event 2 Event 3 Consumer Microservice Consumer Microservice Partition Topic Broker ask.danielkim.fun
  • 12. ©2008–22 New Relic, Inc. All rights reserved Aggregate Normalize Decorate Transform NRDB ask.danielkim.fun
  • 13. ©2008–22 New Relic, Inc. All rights reserved Aggregate Normalize Decorate Transform NRDB ask.danielkim.fun
  • 14. ©2008–22 New Relic, Inc. All rights reserved Kafka retains messages for a set amount of time so consumers can potentially go down or lag behind incoming data and recover without data loss ask.danielkim.fun
  • 15. ©2008–22 New Relic, Inc. All rights reserved Why do you need to trace Kafka? ask.danielkim.fun
  • 16. ©2008–22 New Relic, Inc. All rights reserved Agent How do we optimize “time to glass”? ask.danielkim.fun
  • 17. ©2008–22 New Relic, Inc. All rights reserved Service Service Service Service NRDB 1. Debug Bottlenecks in the ingest pipeline
  • 18. ©2008–22 New Relic, Inc. All rights reserved Service Service [Canary] Service Service 2. Get visibility into Downstream impacts of your service
  • 19. ©2008–22 New Relic, Inc. All rights reserved Service Service] Service Service 3. Measure Data Loss
  • 20. ©2008–22 New Relic, Inc. All rights reserved Service A 4. Fine Tune Kafka Config (Handling BackPressure) Consumer Producer
  • 21. ©2008–22 New Relic, Inc. All rights reserved Service A Consumer Producer 4. Fine Tune Kafka Config (Handling BackPressure)
  • 22. ©2008–22 New Relic, Inc. All rights reserved Producer Batch 1 5. Fine Tune Kafka Config (Batching/Compressing Messages) Message 1 Message 2 Message 3 Message 4 Message 5 Message 6 📦 batch.size = 6 Compression
  • 23. ©2008–22 New Relic, Inc. All rights reserved Producer Batch 2 5. Fine Tune Kafka Config (Batching/Compressing Messages) Message 1 Message 2 Message 3 📦 batch.size = 6 ⌛linger.ms = 20 📦 batch.size = 6 Compression
  • 24. ©2008–22 New Relic, Inc. All rights reserved is What Distributed Tracing? ask.danielkim.fun
  • 25. ©2008–22 New Relic, Inc. All rights reserved What is Distributed Tracing? User My Awesome API ask.danielkim.fun
  • 26. ©2008–22 New Relic, Inc. All rights reserved What is Distributed Tracing? User My Awesome API …. 13 SECONDS LATER ask.danielkim.fun
  • 27. ©2008–22 New Relic, Inc. All rights reserved What is Distributed Tracing? User My Awesome API …. 13 SECONDS LATER A B D C ask.danielkim.fun
  • 28. ©2008–22 New Relic, Inc. All rights reserved How do we connect the dots? Context ask.danielkim.fun
  • 29. ©2008–22 New Relic, Inc. All rights reserved Context in action My Awesome API A B We only see one side of the transaction. ask.danielkim.fun
  • 30. ©2008–22 New Relic, Inc. All rights reserved Context in action My Awesome API A .Here A is sending a payload to somewhere. ask.danielkim.fun
  • 31. ©2008–22 New Relic, Inc. All rights reserved Context in action My Awesome API B Here B is receiving a payload from somewhere ask.danielkim.fun
  • 32. ©2008–22 New Relic, Inc. All rights reserved Context in action My Awesome API A B Each Individual Span gets its own id ask.danielkim.fun
  • 33. ©2008–22 New Relic, Inc. All rights reserved Context in action My Awesome API A B Each Individual Span gets its own id (transaction) And the whole trace gets its own id (parentid) ask.danielkim.fun
  • 34. ©2008–22 New Relic, Inc. All rights reserved Context in action My Awesome API A B Each Individual Span gets its own id (transaction) And the whole trace gets its own id (parentid) Both data points get encoded as a set of HTTP headers ask.danielkim.fun
  • 35. ©2008–22 New Relic, Inc. All rights reserved Context in action My Awesome API A B transaction=12H4 parentid=x1Fd Context x1Fd Overall Trace 12H4 And the parent span is x1FD h4a5 ask.danielkim.fun
  • 36. ©2008–22 New Relic, Inc. All rights reserved Context in action My Awesome API A B transaction=12H4 parentid=x1Fd Received by A x1Fd Overall Trace 12H4 And the parent span is x1FD h4a5 x1Fd h4a5 12H4 ask.danielkim.fun
  • 37. ©2008–22 New Relic, Inc. All rights reserved A B What if we have microservices with different languages or frameworks? What if microservices are using different observability systems? What if we have decoupled microservices via Kafka? B ask.danielkim.fun
  • 38. ©2008–22 New Relic, Inc. All rights reserved W3C standard ask.danielkim.fun
  • 39. ©2008–22 New Relic, Inc. All rights reserved A C X-NEWRELIC : mY1d3 B ask.danielkim.fun
  • 40. ©2008–22 New Relic, Inc. All rights reserved A C TraceParent: … TraceState: NR=... TraceParent: … TraceState: NR=... B ask.danielkim.fun
  • 41. ©2008–22 New Relic, Inc. All rights reserved OSS Standard for Context Propagation traceparent tracestate A B W3C Trace Context enables cross-vendor interoperation of traces ask.danielkim.fun
  • 42. ©2008–22 New Relic, Inc. All rights reserved Adoption of W3C Tracing standard https://github.com/w3c/trace-context/blob/main/implementations.md Vendors adopting W3C OpenTelemetry
  • 43. ©2008–22 New Relic, Inc. All rights reserved is What OpenTelemetry ask.danielkim.fun
  • 44. ©2008–22 New Relic, Inc. All rights reserved What is OpenTelemetry? + = ask.danielkim.fun
  • 45. ©2008–22 New Relic, Inc. All rights reserved What is OpenTelemetry? My Awesome API A B D C It’s an Open Source Standard for instrumentation ask.danielkim.fun
  • 46. ©2008–22 New Relic, Inc. All rights reserved What is OpenTelemetry? My Awesome API A B D C Vendor Neutral Export data into multiple data backends for processing ask.danielkim.fun
  • 47. ©2008–22 New Relic, Inc. All rights reserved Distributed Tracing Implementing For Kafka ask.danielkim.fun
  • 48. ©2008–22 New Relic, Inc. All rights reserved What happens with Kafka? My Awesome API A B Context is lost and transaction is broken ask.danielkim.fun
  • 49. ©2008–22 New Relic, Inc. All rights reserved What happens with Kafka? Consumer Producer Kafka decouples producers and consumers ask.danielkim.fun
  • 50. ©2008–22 New Relic, Inc. All rights reserved CONTEXT Adding to traces ask.danielkim.fun
  • 51. ©2008–22 New Relic, Inc. All rights reserved Otelsarama injects context into Kafka messages, allowing you to trace asynchronous messages
  • 52. ©2008–22 New Relic, Inc. All rights reserved Kafka Headers Message HEADER Add metadata about your record without changing the content of your message ask.danielkim.fun
  • 53. ©2008–22 New Relic, Inc. All rights reserved Kafka Headers Message Let’s inject context (W3C Tracing Headers) TraceState TraceParent ask.danielkim.fun
  • 54. ©2008–22 New Relic, Inc. All rights reserved Kafka Headers Producer TraceState Message TraceParent Producer needs to Inject context into Kafka Header ask.danielkim.fun
  • 55. ©2008–22 New Relic, Inc. All rights reserved Kafka Headers TraceState Message TraceParent ask.danielkim.fun
  • 56. ©2008–22 New Relic, Inc. All rights reserved Kafka Headers Consumer TraceState Message TraceParent ask.danielkim.fun Context (Parent Span) Timestamps Attributes …
  • 57. ©2008–22 New Relic, Inc. All rights reserved Let’s print out what is injected in the header kcat https://github.com/edenhill/kcat ask.danielkim.fun
  • 58. ©2008–22 New Relic, Inc. All rights reserved Reading the trace headers 00-6626f84bd20a9a171a699a560387102b-10ba2f2a74c884bf-01 Tracing Standard Version number (Base16) ask.danielkim.fun
  • 59. ©2008–22 New Relic, Inc. All rights reserved Reading the trace headers 00-6626f84bd20a9a171a699a560387102b-10ba2f2a74c884bf-01 Trace ID (Base16) This is the ID of the whole trace and is used to uniquely identify a distributed trace through a system. ask.danielkim.fun
  • 60. ©2008–22 New Relic, Inc. All rights reserved Reading the trace headers 00-6626f84bd20a9a171a699a560387102b-10ba2f2a74c884bf-01 Trace Parent(Base16) ask.danielkim.fun
  • 61. ©2008–22 New Relic, Inc. All rights reserved Reading the trace headers 00-6626f84bd20a9a171a699a560387102b-10ba2f2a74c884bf-01 Tracing Flags This flag indicates that the Span has been sampled and will be exported ask.danielkim.fun
  • 62. ©2008–22 New Relic, Inc. All rights reserved OTel in Action Consumer Producer Traceid Spanid Timestamps ask.danielkim.fun
  • 63. ©2008–22 New Relic, Inc. All rights reserved Live Demo github.com/lazyplatypus/kafka-otel ask.danielkim.fun
  • 64. ©2008–22 New Relic, Inc. All rights reserved
  • 65. ©2008–22 New Relic, Inc. All rights reserved What’s Happening? message Consumer The consumer was busy handling other requests so the message was sitting in the kafka queue
  • 66. ©2008–22 New Relic, Inc. All rights reserved What’s Happening? ConsumeClaim time.Sleep printMessage MarkMessage tweetMessage
  • 67. ©2008–22 New Relic, Inc. All rights reserved WE What have Learned?
  • 68. ©2008–22 New Relic, Inc. All rights reserved Caveat: Implementing Distributed Tracing is NOT EASY Especially at Scale, a lot of Cross Functional work to make sure tracing is implemented in every microservice in your set up. It is gonna be a lot of data, even when you sample every 10k, 50k, 100k traces.
  • 69. ©2008–22 New Relic, Inc. All rights reserved Distributed Tracing is AWESOME! - It gets you better visibility into the WHY on things are behaving funky and helps you optimize your data ingest pipeline - It helps you be a better neighbor - Helps you tweak config variables to make your system more performant. - OSS Instrumentation/standards is the future of observability! ask.danielkim.fun
  • 70. ©2008–22 New Relic, Inc. All rights reserved Thank you! twitter.com/learnwdaniel ask.danielkim.fun