SlideShare a Scribd company logo
1 of 32
OpenTelemetry For
Operators
Presented by Kevin Brockhoff
Apache 2.0 Licensed
Our
Agenda
â—Ź Why are current observability platforms
falling short?
â—Ź What OpenTelemetry features address
these issues?
â—Ź How do I run OpenTelemetry
components in production?
â—Ź Who are the innovators in the
observability space?
Level
Setting
â—Ź Have you used ELK stack or other log
aggregator?
â—Ź Have you used an APM system?
â—Ź Have you used distributed tracing
before?
Who am I?
â—Ź Kevin Brockhoff - Senior
Consultant, Daugherty Business
Solutions
â—‹ Solving difficult cloud adoption
challenges for Daugherty's
Fortune 500 clients
â—‹ OpenTelemetry committer since
early stages of the project
â—‹ Github:
https://github.com/kbrockhoff
â—‹ Linkedin:
https://www.linkedin.com/in/kevi
n-brockhoff-a557877/
5
Observability Today
6
Enterprise Applications
â—Ź Only instrumented with logging during initial development.
â—‹ Logging oriented toward development, not operations
â—Ź Metrics and tracing only added later if at all as a separate project.
â—‹ Each team creates their own system using familiar tools
â—‹ Or enterprise commits to a specific APM vendor
â—Ź Logs, metrics and traces are never connected.
7
First Generation Observability Platforms
Search logs in ELK,
Lack context
Homegrown tracing per
app mainly accessible by
developers
Customer experience
metrics
Low-level metrics
and alerts
8
OpenTelemetry Project
OpenCensus + OpenTracing = OpenTelemetry
â—Ź OpenTracing:
â—‹ Provides APIs and instrumentation for distributed tracing
â—Ź OpenCensus:
â—‹ Provides APIs and instrumentation that allow you to collect application metrics and
distributed tracing.
â—Ź OpenTelemetry:
â—‹ An effort to combine distributed tracing, metrics and logging into a single set of system
components and language-specific libraries.
10
OpenTelemetry Project
â—Ź Specification
â—‹ API (for application developers)
â—‹ SDK Implementations
â—‹ Transport Protocol (Protobuf)
â—Ź Collector (middleware)
● SDK’s (various stages of maturity)
â—‹ C++
â—‹ C# (Auto-instrument/Manual)
â—‹ Erlang
â—‹ Go
â—‹ JavaScript (Browser/Node)
â—‹ Java (Auto-instrument/Manual)
â–  Android compatibility
â—‹ PHP
â—‹ Python (Auto-instrument/Manual)
â—‹ Ruby
â—‹ Rust
â—‹ Swift
From Observability 1.0 to 2.0
12
OpenTelemetry Collector
13
OpenTelemetry Collector
â—Ź Offers a vendor-agnostic implementation on how to receive, process and
export telemetry data.
â—Ź Removes the need to run, operate and maintain multiple
agents/collectors.
â—Ź Support open-source telemetry data formats (e.g. OTLP, Jaeger,
Prometheus, etc.) sending to multiple open-source or commercial back-
ends.
14
Collector Concepts
â—Ź Telemetry data processing pipelines
â—‹ Per pipeline: Receiver(s) -> Processors -> Exporter(s)
â—‹ Currently only single telemetry type pipelines supported
â—Ź Extensions
â—‹ Supporting functionality
â—‹ Core collector extensions
â–  health_check - HTTP endpoint for load balancer or k8s controller
â–  zpages - Internal processing metrics and traces accessible via HTTP
â–  pprof - Performance profiler enables the golang net/http/pprof endpoint
Collector Bundled Receivers
Traces
â—Ź Jaeger
â—‹ Compact Thrift, Binary Thrift, HTTP,
gRPC
â—‹ Sampling strategy configuration server
â—Ź Kafka
â—‹ OTLP, Jaeger, Zipkin data structures
â—Ź OpenCensus
â—Ź OTLP (OpenTelemetry Protocol)
â—‹ gRPC, HTTP
â—Ź Zipkin
â—‹ v1, v1 Thrift, v2, v2 Protobuf
Metrics
â—Ź Host metrics scrapper
â—‹ cpu, disk, load, filesystem, memory,
network, processes, swap, process
â—Ź Kafka
â—‹ OTLP
â—Ź OpenCensus
â—Ź OTLP (OpenTelemetry Protocol)
â—‹ gRPC, HTTP
â—Ź Prometheus
â—‹ Full discovery and polling capabilities
Logs
â—Ź Fluent Forward
â—‹ Spec compliant except no mTLS
Collector Contrib Receivers
Traces
â—Ź AWS X-Ray
â—Ź SignalFX APM v1
Metrics
â—Ź AWS ECS Container
â—Ź Carbon
â—Ź CollectD (JSON only)
â—Ź Docker Stats
â—Ź Kubernetes Cluster
â—Ź Kubernetes Kubelet
â—Ź Prometheus Exporters
â—Ź Redis INFO
â—Ź SignalFX
â—Ź Splunk HEC
â—Ź StatsD
â—Ź Wavefront
Logs
â—Ź SignalFX (Events)
â—Ź Stanza
Collector Bundled Processors
â—Ź Attributes
â—‹ Modifies span attributes
â—Ź Batch
â—‹ Groups data into batches
â—Ź Filter
â—‹ Include/exclude metrics by name
â—Ź Group by Trace
â—‹ Holds all spans for a trace for a set time
and then sends to next processor
â—Ź Memory Limiter
â—‹ Prevents out-of-memory issues by
triggering GC
â—‹ Configuration must be matched with
ballast setting collector is launched with
â—Ź Queued Retry
â—‹ Deprecated, each exporter now
implements
â—Ź Resource
â—‹ Applies changes to Resource attributes
â—Ź Probabilistic Sampling
â—‹ Adjusts TraceID hash-based sampling
decisions by sampling.priority
attribute value
â—Ź Tail Sampling
â—‹ Sampling decisions based on configured
attribute values and rate limits
â—Ź Span
â—‹ Modifies span name or attributes based
on span name
18
Recommended Processor Configuration
Traces
memory_limiter
any sampling processors
batch
any other processors
Metrics
memory_limiter
any filtering processors
batch
any other processors
Memory limiter ballast_size_mib must match --mem-ballast-size-mib command line
parameter. Trigger GC with either limit_mib / spike_limit_mib or limit_percentage /
spike_limit_percentage.
Collector Contrib Processors
â—Ź Kubernetes
â—‹ Adds metadata from pod
â—Ź Metrics Transform
â—‹ Renames/aggregations within individual
metrics
â—Ź Resource Detection
â—‹ OTEL_RESOURCE environment variable
â—‹ GCE metadata server
â—‹ EC2 instance metadata server
â—Ź Routing
â—‹ Route to particular exporter based on
incoming header value
TODO
â—Ź Span data sharding by TraceID
Collector Bundled Exporters
Traces
â—Ź File
â—‹ JSON format
â—Ź Jaeger
â—‹ v2 gRPC
â—Ź Kafka
â—‹ OTLP, Jaeger, Zipkin
â—Ź Logging
â—‹ Debugging
â—Ź OpenCensus
â—Ź OTLP (OpenTelemetry Protocol)
â—Ź Zipkin
â—‹ v2 JSON or Protobuf
Metrics
â—Ź File
â—‹ JSON format
â—Ź Logging
â—‹ Debugging
â—Ź OpenCensus
â—Ź OTLP (OpenTelemetry Protocol)
â—Ź Prometheus
â—‹ Metrics endpoint for Prometheus to pull
from
â—Ź Prometheus Remote Write
â—‹ Pushes metrics in Prometheus
TimeSeries format (Cortex, etc.)
Collector Contrib Exporters
Traces
â—Ź AlibabaCloud LogService
â—Ź AWS X-Ray
â—Ź Azure Monitor
â—Ź Datadog
â—Ź Elastic
â—Ź Honeycomb
â—Ź Jaeger v1 Thrift
â—Ź AWS Kinesis (Jaeger proto)
â—Ź New Relic
â—Ź SignalFX APM
â—Ź Sentry
â—Ź Stackdriver
Metrics
â—Ź AlibabaCloud LogService
â—Ź AWS CloudWatch EMF
â—Ź Carbon
â—Ź Datadog
â—Ź Elastic
â—Ź New Relic
â—Ź SignalFX
â—Ź Splunk HEC
â—Ź Stackdriver
Vendor Hosted Exporters
Traces
â—Ź Dynatrace OneAgent
â—Ź Lightstep Launchers
Metrics
â—Ź Dynatrace OneAgent
â—Ź Lightstep Launchers
receivers:
otlp:
protocols:
grpc:
max_recv_msg_size_mib: 32
max_concurrent_streams: 16
read_buffer_size: 1024
write_buffer_size: 1024
keepalive:
server_parameters:
max_connection_idle: 10s
processors:
memory_limiter:
ballast_size_mib: 192
check_interval: 5s
limit_mib: 448
spike_limit_mib: 64
batch:
send_batch_size: 64
timeout: 15s
exporters:
jaeger:
endpoint: jaeger.monitoring.svc.storefront-development.local.:14250
timeout: 10s
sending_queue:
enabled: true
num_consumers: 2
queue_size: 10
retry_on_failure:
enabled: true
initial_interval: 10s
max_interval: 60s
max_elapsed_time: 10m
prometheusremotewrite:
namespace: "monitoring"
sending_queue:
enabled: true
num_consumers: 2
queue_size: 10
retry_on_failure:
enabled: true
initial_interval: 10s
max_interval: 60s
max_elapsed_time: 10m
endpoint: ":8888"
ca_file: "/etc/pki/tls/certs/carbon-lb.pem"
write_buffer_size: 524288
headers:
Prometheus-Remote-Write-Version: "0.1.0"
X-Scope-OrgID: 234
extensions:
health_check:
port: 13133
zpages:
endpoint: :55679
service:
extensions: [zpages, health_check]
pipelines:
traces:
receivers: [otlp]
processors: [memory_limiter, batch]
exporters: [jaeger]
metrics:
receivers: [otlp]
processors: [memory_limiter, batch]
exporters: [prometheusremotewrite]
Full Configuration File Example
Collector Command Line Example
/usr/local/bin/otelcol 
--config=/usr/local/etc/otel-collector-config.yaml 
--mem-ballast-size-mib=192 
--log-level=DEBUG
25
Collector Docker Images
â—Ź otel/opentelemetry-collector
â—‹ Core receivers, processors, and exporters bundled in
â—Ź otel/opentelemetry-collector-contrib
â—‹ All core and contrib receivers, processors, and exporters bundled in
â—Ź OpenTelemetry Collector builder
â—‹ https://github.com/observatorium/opentelemetry-collector-builder
26
Other Collector Installs
â—Ź RPM
â—‹ Produced by opentelemetry-collector build
â—Ź Debian
â—‹ Produced by opentelemetry-collector build
27
Observing the Collector
â—Ź health_check
â—‹ http://<hostname>:13133/ returns basic
pipeline availability
â—Ź zpages
â—‹ RPC metric aggregations at
http://<hostname>:55679/debug/rpcz
â—‹ Trace summaries at
http://<hostname>:55679/debug/tracez
â—Ź prometheus
â—‹ Pipeline metrics scrap endpoint at
http://<hostname>:8888/metrics
28
Current Gotchas
â—Ź Errors propagated back through pipelines and instances in the chain
â—‹ Errors reported by SDK exporters in the applications may be coming from two hops
downstream
â—Ź TraceID sharding not working correctly
â—‹ Can only do tail-based sampling if running single instance of collector
29
Observability Platform Innovations
30
Latest Innovations
â—Ź Dynatrace automates manual quality validation processes using AI-
assisted SLI/SLO-based quality gates.
â—Ź New Relic Incident Intelligence continuously analyzes alerts and incident
data to find patterns in event sequences and offers suggested correlation
decisions that merge incidents to reduce alert noise further.
â—Ź Splunk SignalFX provides high cardinality exploration of traces across
different regions, hosts, versions or users.
â—Ź Lightstep provides rapid root cause analysis using unlimited cardinality
and a high-fidelity dataset uncompromised by head or tail sampling,
31
Latest Innovations
â—Ź Datadog provides automated tagging and correlation of logs so can jump
from any log entry to related metrics.
â—Ź Honeycomb lets you break down on every dimension in your data both
the obvious fields, and the surprising ones.
â—Ź Grafana Loki datasource provides switching from metrics to logs with
preserved label filters.
â—Ź Elastic Observability bring your logs, metrics, and APM traces together at
scale in a single stack.
32
Thank you!

More Related Content

What's hot

What's hot (20)

Everything You wanted to Know About Distributed Tracing
Everything You wanted to Know About Distributed TracingEverything You wanted to Know About Distributed Tracing
Everything You wanted to Know About Distributed Tracing
 
Meetup OpenTelemetry Intro
Meetup OpenTelemetry IntroMeetup OpenTelemetry Intro
Meetup OpenTelemetry Intro
 
Opentelemetry - From frontend to backend
Opentelemetry - From frontend to backendOpentelemetry - From frontend to backend
Opentelemetry - From frontend to backend
 
Grafana Loki: like Prometheus, but for Logs
Grafana Loki: like Prometheus, but for LogsGrafana Loki: like Prometheus, but for Logs
Grafana Loki: like Prometheus, but for Logs
 
Observability driven development
Observability driven developmentObservability driven development
Observability driven development
 
Introduction to Open Telemetry as Observability Library
Introduction to Open  Telemetry as Observability LibraryIntroduction to Open  Telemetry as Observability Library
Introduction to Open Telemetry as Observability Library
 
MySQL Monitoring using Prometheus & Grafana
MySQL Monitoring using Prometheus & GrafanaMySQL Monitoring using Prometheus & Grafana
MySQL Monitoring using Prometheus & Grafana
 
OSMC 2022 | OpenTelemetry 101 by Dotan Horovit s.pdf
OSMC 2022 | OpenTelemetry 101 by Dotan Horovit s.pdfOSMC 2022 | OpenTelemetry 101 by Dotan Horovit s.pdf
OSMC 2022 | OpenTelemetry 101 by Dotan Horovit s.pdf
 
Adopting OpenTelemetry
Adopting OpenTelemetryAdopting OpenTelemetry
Adopting OpenTelemetry
 
Free GitOps Workshop + Intro to Kubernetes & GitOps
Free GitOps Workshop + Intro to Kubernetes & GitOpsFree GitOps Workshop + Intro to Kubernetes & GitOps
Free GitOps Workshop + Intro to Kubernetes & GitOps
 
OpenTelemetry: From front- to backend (2022)
OpenTelemetry: From front- to backend (2022)OpenTelemetry: From front- to backend (2022)
OpenTelemetry: From front- to backend (2022)
 
Meetup 23 - 03 - Application Delivery on K8S with GitOps
Meetup 23 - 03 - Application Delivery on K8S with GitOpsMeetup 23 - 03 - Application Delivery on K8S with GitOps
Meetup 23 - 03 - Application Delivery on K8S with GitOps
 
Improve monitoring and observability for kubernetes with oss tools
Improve monitoring and observability for kubernetes with oss toolsImprove monitoring and observability for kubernetes with oss tools
Improve monitoring and observability for kubernetes with oss tools
 
Observability & Datadog
Observability & DatadogObservability & Datadog
Observability & Datadog
 
Server monitoring using grafana and prometheus
Server monitoring using grafana and prometheusServer monitoring using grafana and prometheus
Server monitoring using grafana and prometheus
 
Observability
ObservabilityObservability
Observability
 
Combining logs, metrics, and traces for unified observability
Combining logs, metrics, and traces for unified observabilityCombining logs, metrics, and traces for unified observability
Combining logs, metrics, and traces for unified observability
 
MeetUp Monitoring with Prometheus and Grafana (September 2018)
MeetUp Monitoring with Prometheus and Grafana (September 2018)MeetUp Monitoring with Prometheus and Grafana (September 2018)
MeetUp Monitoring with Prometheus and Grafana (September 2018)
 
Observability
ObservabilityObservability
Observability
 
How to Move from Monitoring to Observability, On-Premises and in a Multi-Clou...
How to Move from Monitoring to Observability, On-Premises and in a Multi-Clou...How to Move from Monitoring to Observability, On-Premises and in a Multi-Clou...
How to Move from Monitoring to Observability, On-Premises and in a Multi-Clou...
 

Similar to OpenTelemetry For Operators

Similar to OpenTelemetry For Operators (20)

[scala.by] Launching new application fast
[scala.by] Launching new application fast[scala.by] Launching new application fast
[scala.by] Launching new application fast
 
Cinder On-boarding Room - Berlin (11-13-2018)
Cinder On-boarding Room - Berlin (11-13-2018)Cinder On-boarding Room - Berlin (11-13-2018)
Cinder On-boarding Room - Berlin (11-13-2018)
 
Prometheus: What is is, what is new, what is coming
Prometheus: What is is, what is new, what is comingPrometheus: What is is, what is new, what is coming
Prometheus: What is is, what is new, what is coming
 
OpenStack Cinder On-Boarding Room - Vancouver Summit 2018
OpenStack Cinder On-Boarding Room - Vancouver Summit 2018OpenStack Cinder On-Boarding Room - Vancouver Summit 2018
OpenStack Cinder On-Boarding Room - Vancouver Summit 2018
 
Kubernetes Colorado - Kubernetes metrics deep dive 10/25/2017
Kubernetes Colorado - Kubernetes metrics deep dive 10/25/2017Kubernetes Colorado - Kubernetes metrics deep dive 10/25/2017
Kubernetes Colorado - Kubernetes metrics deep dive 10/25/2017
 
OpenStack Cinder On-Boarding Education - Boston Summit - 2017
OpenStack Cinder On-Boarding Education - Boston Summit - 2017OpenStack Cinder On-Boarding Education - Boston Summit - 2017
OpenStack Cinder On-Boarding Education - Boston Summit - 2017
 
Getting Started with Performance Co-Pilot
Getting Started with Performance Co-PilotGetting Started with Performance Co-Pilot
Getting Started with Performance Co-Pilot
 
Introducing KRaft: Kafka Without Zookeeper With Colin McCabe | Current 2022
Introducing KRaft: Kafka Without Zookeeper With Colin McCabe | Current 2022Introducing KRaft: Kafka Without Zookeeper With Colin McCabe | Current 2022
Introducing KRaft: Kafka Without Zookeeper With Colin McCabe | Current 2022
 
20180503 kube con eu kubernetes metrics deep dive
20180503 kube con eu   kubernetes metrics deep dive20180503 kube con eu   kubernetes metrics deep dive
20180503 kube con eu kubernetes metrics deep dive
 
Dev.bg DevOps March 2024 Monitoring & Logging
Dev.bg DevOps March 2024 Monitoring & LoggingDev.bg DevOps March 2024 Monitoring & Logging
Dev.bg DevOps March 2024 Monitoring & Logging
 
Integrating Puppet and Gitolite for sysadmins cooperations
Integrating Puppet and Gitolite for sysadmins cooperationsIntegrating Puppet and Gitolite for sysadmins cooperations
Integrating Puppet and Gitolite for sysadmins cooperations
 
Kubernetes from scratch at veepee sysadmins days 2019
Kubernetes from scratch at veepee   sysadmins days 2019Kubernetes from scratch at veepee   sysadmins days 2019
Kubernetes from scratch at veepee sysadmins days 2019
 
uReplicator: Uber Engineering’s Scalable, Robust Kafka Replicator
uReplicator: Uber Engineering’s Scalable,  Robust Kafka ReplicatoruReplicator: Uber Engineering’s Scalable,  Robust Kafka Replicator
uReplicator: Uber Engineering’s Scalable, Robust Kafka Replicator
 
Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...
Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...
Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...
 
Cinder Project On-Boarding - OpenInfra Summit Denver 2019
Cinder Project On-Boarding - OpenInfra Summit Denver 2019Cinder Project On-Boarding - OpenInfra Summit Denver 2019
Cinder Project On-Boarding - OpenInfra Summit Denver 2019
 
A Kong retrospective: from 0.10 to 0.13
A Kong retrospective: from 0.10 to 0.13A Kong retrospective: from 0.10 to 0.13
A Kong retrospective: from 0.10 to 0.13
 
BKK16-302: Android Optimizing Compiler: New Member Assimilation Guide
BKK16-302: Android Optimizing Compiler: New Member Assimilation GuideBKK16-302: Android Optimizing Compiler: New Member Assimilation Guide
BKK16-302: Android Optimizing Compiler: New Member Assimilation Guide
 
LCE13: Test and Validation Summit: The future of testing at Linaro
LCE13: Test and Validation Summit: The future of testing at LinaroLCE13: Test and Validation Summit: The future of testing at Linaro
LCE13: Test and Validation Summit: The future of testing at Linaro
 
LCE13: Test and Validation Mini-Summit: Review Current Linaro Engineering Pro...
LCE13: Test and Validation Mini-Summit: Review Current Linaro Engineering Pro...LCE13: Test and Validation Mini-Summit: Review Current Linaro Engineering Pro...
LCE13: Test and Validation Mini-Summit: Review Current Linaro Engineering Pro...
 
DevOps Braga #15: Agentless monitoring with icinga and prometheus
DevOps Braga #15: Agentless monitoring with icinga and prometheusDevOps Braga #15: Agentless monitoring with icinga and prometheus
DevOps Braga #15: Agentless monitoring with icinga and prometheus
 

Recently uploaded

The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
shinachiaurasa2
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
masabamasaba
 

Recently uploaded (20)

tonesoftg
tonesoftgtonesoftg
tonesoftg
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go Platformless
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
 
%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 

OpenTelemetry For Operators

  • 1. OpenTelemetry For Operators Presented by Kevin Brockhoff Apache 2.0 Licensed
  • 2. Our Agenda â—Ź Why are current observability platforms falling short? â—Ź What OpenTelemetry features address these issues? â—Ź How do I run OpenTelemetry components in production? â—Ź Who are the innovators in the observability space?
  • 3. Level Setting â—Ź Have you used ELK stack or other log aggregator? â—Ź Have you used an APM system? â—Ź Have you used distributed tracing before?
  • 4. Who am I? â—Ź Kevin Brockhoff - Senior Consultant, Daugherty Business Solutions â—‹ Solving difficult cloud adoption challenges for Daugherty's Fortune 500 clients â—‹ OpenTelemetry committer since early stages of the project â—‹ Github: https://github.com/kbrockhoff â—‹ Linkedin: https://www.linkedin.com/in/kevi n-brockhoff-a557877/
  • 6. 6 Enterprise Applications â—Ź Only instrumented with logging during initial development. â—‹ Logging oriented toward development, not operations â—Ź Metrics and tracing only added later if at all as a separate project. â—‹ Each team creates their own system using familiar tools â—‹ Or enterprise commits to a specific APM vendor â—Ź Logs, metrics and traces are never connected.
  • 7. 7 First Generation Observability Platforms Search logs in ELK, Lack context Homegrown tracing per app mainly accessible by developers Customer experience metrics Low-level metrics and alerts
  • 9. OpenCensus + OpenTracing = OpenTelemetry â—Ź OpenTracing: â—‹ Provides APIs and instrumentation for distributed tracing â—Ź OpenCensus: â—‹ Provides APIs and instrumentation that allow you to collect application metrics and distributed tracing. â—Ź OpenTelemetry: â—‹ An effort to combine distributed tracing, metrics and logging into a single set of system components and language-specific libraries.
  • 10. 10 OpenTelemetry Project â—Ź Specification â—‹ API (for application developers) â—‹ SDK Implementations â—‹ Transport Protocol (Protobuf) â—Ź Collector (middleware) â—Ź SDK’s (various stages of maturity) â—‹ C++ â—‹ C# (Auto-instrument/Manual) â—‹ Erlang â—‹ Go â—‹ JavaScript (Browser/Node) â—‹ Java (Auto-instrument/Manual) â–  Android compatibility â—‹ PHP â—‹ Python (Auto-instrument/Manual) â—‹ Ruby â—‹ Rust â—‹ Swift
  • 13. 13 OpenTelemetry Collector â—Ź Offers a vendor-agnostic implementation on how to receive, process and export telemetry data. â—Ź Removes the need to run, operate and maintain multiple agents/collectors. â—Ź Support open-source telemetry data formats (e.g. OTLP, Jaeger, Prometheus, etc.) sending to multiple open-source or commercial back- ends.
  • 14. 14 Collector Concepts â—Ź Telemetry data processing pipelines â—‹ Per pipeline: Receiver(s) -> Processors -> Exporter(s) â—‹ Currently only single telemetry type pipelines supported â—Ź Extensions â—‹ Supporting functionality â—‹ Core collector extensions â–  health_check - HTTP endpoint for load balancer or k8s controller â–  zpages - Internal processing metrics and traces accessible via HTTP â–  pprof - Performance profiler enables the golang net/http/pprof endpoint
  • 15. Collector Bundled Receivers Traces â—Ź Jaeger â—‹ Compact Thrift, Binary Thrift, HTTP, gRPC â—‹ Sampling strategy configuration server â—Ź Kafka â—‹ OTLP, Jaeger, Zipkin data structures â—Ź OpenCensus â—Ź OTLP (OpenTelemetry Protocol) â—‹ gRPC, HTTP â—Ź Zipkin â—‹ v1, v1 Thrift, v2, v2 Protobuf Metrics â—Ź Host metrics scrapper â—‹ cpu, disk, load, filesystem, memory, network, processes, swap, process â—Ź Kafka â—‹ OTLP â—Ź OpenCensus â—Ź OTLP (OpenTelemetry Protocol) â—‹ gRPC, HTTP â—Ź Prometheus â—‹ Full discovery and polling capabilities Logs â—Ź Fluent Forward â—‹ Spec compliant except no mTLS
  • 16. Collector Contrib Receivers Traces â—Ź AWS X-Ray â—Ź SignalFX APM v1 Metrics â—Ź AWS ECS Container â—Ź Carbon â—Ź CollectD (JSON only) â—Ź Docker Stats â—Ź Kubernetes Cluster â—Ź Kubernetes Kubelet â—Ź Prometheus Exporters â—Ź Redis INFO â—Ź SignalFX â—Ź Splunk HEC â—Ź StatsD â—Ź Wavefront Logs â—Ź SignalFX (Events) â—Ź Stanza
  • 17. Collector Bundled Processors â—Ź Attributes â—‹ Modifies span attributes â—Ź Batch â—‹ Groups data into batches â—Ź Filter â—‹ Include/exclude metrics by name â—Ź Group by Trace â—‹ Holds all spans for a trace for a set time and then sends to next processor â—Ź Memory Limiter â—‹ Prevents out-of-memory issues by triggering GC â—‹ Configuration must be matched with ballast setting collector is launched with â—Ź Queued Retry â—‹ Deprecated, each exporter now implements â—Ź Resource â—‹ Applies changes to Resource attributes â—Ź Probabilistic Sampling â—‹ Adjusts TraceID hash-based sampling decisions by sampling.priority attribute value â—Ź Tail Sampling â—‹ Sampling decisions based on configured attribute values and rate limits â—Ź Span â—‹ Modifies span name or attributes based on span name
  • 18. 18 Recommended Processor Configuration Traces memory_limiter any sampling processors batch any other processors Metrics memory_limiter any filtering processors batch any other processors Memory limiter ballast_size_mib must match --mem-ballast-size-mib command line parameter. Trigger GC with either limit_mib / spike_limit_mib or limit_percentage / spike_limit_percentage.
  • 19. Collector Contrib Processors â—Ź Kubernetes â—‹ Adds metadata from pod â—Ź Metrics Transform â—‹ Renames/aggregations within individual metrics â—Ź Resource Detection â—‹ OTEL_RESOURCE environment variable â—‹ GCE metadata server â—‹ EC2 instance metadata server â—Ź Routing â—‹ Route to particular exporter based on incoming header value TODO â—Ź Span data sharding by TraceID
  • 20. Collector Bundled Exporters Traces â—Ź File â—‹ JSON format â—Ź Jaeger â—‹ v2 gRPC â—Ź Kafka â—‹ OTLP, Jaeger, Zipkin â—Ź Logging â—‹ Debugging â—Ź OpenCensus â—Ź OTLP (OpenTelemetry Protocol) â—Ź Zipkin â—‹ v2 JSON or Protobuf Metrics â—Ź File â—‹ JSON format â—Ź Logging â—‹ Debugging â—Ź OpenCensus â—Ź OTLP (OpenTelemetry Protocol) â—Ź Prometheus â—‹ Metrics endpoint for Prometheus to pull from â—Ź Prometheus Remote Write â—‹ Pushes metrics in Prometheus TimeSeries format (Cortex, etc.)
  • 21. Collector Contrib Exporters Traces â—Ź AlibabaCloud LogService â—Ź AWS X-Ray â—Ź Azure Monitor â—Ź Datadog â—Ź Elastic â—Ź Honeycomb â—Ź Jaeger v1 Thrift â—Ź AWS Kinesis (Jaeger proto) â—Ź New Relic â—Ź SignalFX APM â—Ź Sentry â—Ź Stackdriver Metrics â—Ź AlibabaCloud LogService â—Ź AWS CloudWatch EMF â—Ź Carbon â—Ź Datadog â—Ź Elastic â—Ź New Relic â—Ź SignalFX â—Ź Splunk HEC â—Ź Stackdriver
  • 22. Vendor Hosted Exporters Traces â—Ź Dynatrace OneAgent â—Ź Lightstep Launchers Metrics â—Ź Dynatrace OneAgent â—Ź Lightstep Launchers
  • 23. receivers: otlp: protocols: grpc: max_recv_msg_size_mib: 32 max_concurrent_streams: 16 read_buffer_size: 1024 write_buffer_size: 1024 keepalive: server_parameters: max_connection_idle: 10s processors: memory_limiter: ballast_size_mib: 192 check_interval: 5s limit_mib: 448 spike_limit_mib: 64 batch: send_batch_size: 64 timeout: 15s exporters: jaeger: endpoint: jaeger.monitoring.svc.storefront-development.local.:14250 timeout: 10s sending_queue: enabled: true num_consumers: 2 queue_size: 10 retry_on_failure: enabled: true initial_interval: 10s max_interval: 60s max_elapsed_time: 10m prometheusremotewrite: namespace: "monitoring" sending_queue: enabled: true num_consumers: 2 queue_size: 10 retry_on_failure: enabled: true initial_interval: 10s max_interval: 60s max_elapsed_time: 10m endpoint: ":8888" ca_file: "/etc/pki/tls/certs/carbon-lb.pem" write_buffer_size: 524288 headers: Prometheus-Remote-Write-Version: "0.1.0" X-Scope-OrgID: 234 extensions: health_check: port: 13133 zpages: endpoint: :55679 service: extensions: [zpages, health_check] pipelines: traces: receivers: [otlp] processors: [memory_limiter, batch] exporters: [jaeger] metrics: receivers: [otlp] processors: [memory_limiter, batch] exporters: [prometheusremotewrite] Full Configuration File Example
  • 24. Collector Command Line Example /usr/local/bin/otelcol --config=/usr/local/etc/otel-collector-config.yaml --mem-ballast-size-mib=192 --log-level=DEBUG
  • 25. 25 Collector Docker Images â—Ź otel/opentelemetry-collector â—‹ Core receivers, processors, and exporters bundled in â—Ź otel/opentelemetry-collector-contrib â—‹ All core and contrib receivers, processors, and exporters bundled in â—Ź OpenTelemetry Collector builder â—‹ https://github.com/observatorium/opentelemetry-collector-builder
  • 26. 26 Other Collector Installs â—Ź RPM â—‹ Produced by opentelemetry-collector build â—Ź Debian â—‹ Produced by opentelemetry-collector build
  • 27. 27 Observing the Collector â—Ź health_check â—‹ http://<hostname>:13133/ returns basic pipeline availability â—Ź zpages â—‹ RPC metric aggregations at http://<hostname>:55679/debug/rpcz â—‹ Trace summaries at http://<hostname>:55679/debug/tracez â—Ź prometheus â—‹ Pipeline metrics scrap endpoint at http://<hostname>:8888/metrics
  • 28. 28 Current Gotchas â—Ź Errors propagated back through pipelines and instances in the chain â—‹ Errors reported by SDK exporters in the applications may be coming from two hops downstream â—Ź TraceID sharding not working correctly â—‹ Can only do tail-based sampling if running single instance of collector
  • 30. 30 Latest Innovations â—Ź Dynatrace automates manual quality validation processes using AI- assisted SLI/SLO-based quality gates. â—Ź New Relic Incident Intelligence continuously analyzes alerts and incident data to find patterns in event sequences and offers suggested correlation decisions that merge incidents to reduce alert noise further. â—Ź Splunk SignalFX provides high cardinality exploration of traces across different regions, hosts, versions or users. â—Ź Lightstep provides rapid root cause analysis using unlimited cardinality and a high-fidelity dataset uncompromised by head or tail sampling,
  • 31. 31 Latest Innovations â—Ź Datadog provides automated tagging and correlation of logs so can jump from any log entry to related metrics. â—Ź Honeycomb lets you break down on every dimension in your data both the obvious fields, and the surprising ones. â—Ź Grafana Loki datasource provides switching from metrics to logs with preserved label filters. â—Ź Elastic Observability bring your logs, metrics, and APM traces together at scale in a single stack.

Editor's Notes

  1. Copyright 2020, The OpenTelemetry Authors Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.