SlideShare a Scribd company logo
OpenTelemetry For
Operators
Presented by Kevin Brockhoff
Apache 2.0 Licensed
Our
Agenda
● Why are current observability platforms
falling short?
● What OpenTelemetry features address
these issues?
● How do I run OpenTelemetry
components in production?
● Who are the innovators in the
observability space?
Level
Setting
● Have you used ELK stack or other log
aggregator?
● Have you used an APM system?
● Have you used distributed tracing
before?
Who am I?
● Kevin Brockhoff - Senior
Consultant, Daugherty Business
Solutions
○ Solving difficult cloud adoption
challenges for Daugherty's
Fortune 500 clients
○ OpenTelemetry committer since
early stages of the project
○ Github:
https://github.com/kbrockhoff
○ Linkedin:
https://www.linkedin.com/in/kevi
n-brockhoff-a557877/
5
Observability Today
6
Enterprise Applications
● Only instrumented with logging during initial development.
○ Logging oriented toward development, not operations
● Metrics and tracing only added later if at all as a separate project.
○ Each team creates their own system using familiar tools
○ Or enterprise commits to a specific APM vendor
● Logs, metrics and traces are never connected.
7
First Generation Observability Platforms
Search logs in ELK,
Lack context
Homegrown tracing per
app mainly accessible by
developers
Customer experience
metrics
Low-level metrics
and alerts
8
OpenTelemetry Project
OpenCensus + OpenTracing = OpenTelemetry
● OpenTracing:
○ Provides APIs and instrumentation for distributed tracing
● OpenCensus:
○ Provides APIs and instrumentation that allow you to collect application metrics and
distributed tracing.
● OpenTelemetry:
○ An effort to combine distributed tracing, metrics and logging into a single set of system
components and language-specific libraries.
10
OpenTelemetry Project
● Specification
○ API (for application developers)
○ SDK Implementations
○ Transport Protocol (Protobuf)
● Collector (middleware)
● SDK’s (various stages of maturity)
○ C++
○ C# (Auto-instrument/Manual)
○ Erlang
○ Go
○ JavaScript (Browser/Node)
○ Java (Auto-instrument/Manual)
■ Android compatibility
○ PHP
○ Python (Auto-instrument/Manual)
○ Ruby
○ Rust
○ Swift
From Observability 1.0 to 2.0
12
OpenTelemetry Collector
13
OpenTelemetry Collector
● Offers a vendor-agnostic implementation on how to receive, process and
export telemetry data.
● Removes the need to run, operate and maintain multiple
agents/collectors.
● Support open-source telemetry data formats (e.g. OTLP, Jaeger,
Prometheus, etc.) sending to multiple open-source or commercial back-
ends.
14
Collector Concepts
● Telemetry data processing pipelines
○ Per pipeline: Receiver(s) -> Processors -> Exporter(s)
○ Currently only single telemetry type pipelines supported
● Extensions
○ Supporting functionality
○ Core collector extensions
■ health_check - HTTP endpoint for load balancer or k8s controller
■ zpages - Internal processing metrics and traces accessible via HTTP
■ pprof - Performance profiler enables the golang net/http/pprof endpoint
Collector Bundled Receivers
Traces
● Jaeger
○ Compact Thrift, Binary Thrift, HTTP,
gRPC
○ Sampling strategy configuration server
● Kafka
○ OTLP, Jaeger, Zipkin data structures
● OpenCensus
● OTLP (OpenTelemetry Protocol)
○ gRPC, HTTP
● Zipkin
○ v1, v1 Thrift, v2, v2 Protobuf
Metrics
● Host metrics scrapper
○ cpu, disk, load, filesystem, memory,
network, processes, swap, process
● Kafka
○ OTLP
● OpenCensus
● OTLP (OpenTelemetry Protocol)
○ gRPC, HTTP
● Prometheus
○ Full discovery and polling capabilities
Logs
● Fluent Forward
○ Spec compliant except no mTLS
Collector Contrib Receivers
Traces
● AWS X-Ray
● SignalFX APM v1
Metrics
● AWS ECS Container
● Carbon
● CollectD (JSON only)
● Docker Stats
● Kubernetes Cluster
● Kubernetes Kubelet
● Prometheus Exporters
● Redis INFO
● SignalFX
● Splunk HEC
● StatsD
● Wavefront
Logs
● SignalFX (Events)
● Stanza
Collector Bundled Processors
● Attributes
○ Modifies span attributes
● Batch
○ Groups data into batches
● Filter
○ Include/exclude metrics by name
● Group by Trace
○ Holds all spans for a trace for a set time
and then sends to next processor
● Memory Limiter
○ Prevents out-of-memory issues by
triggering GC
○ Configuration must be matched with
ballast setting collector is launched with
● Queued Retry
○ Deprecated, each exporter now
implements
● Resource
○ Applies changes to Resource attributes
● Probabilistic Sampling
○ Adjusts TraceID hash-based sampling
decisions by sampling.priority
attribute value
● Tail Sampling
○ Sampling decisions based on configured
attribute values and rate limits
● Span
○ Modifies span name or attributes based
on span name
18
Recommended Processor Configuration
Traces
memory_limiter
any sampling processors
batch
any other processors
Metrics
memory_limiter
any filtering processors
batch
any other processors
Memory limiter ballast_size_mib must match --mem-ballast-size-mib command line
parameter. Trigger GC with either limit_mib / spike_limit_mib or limit_percentage /
spike_limit_percentage.
Collector Contrib Processors
● Kubernetes
○ Adds metadata from pod
● Metrics Transform
○ Renames/aggregations within individual
metrics
● Resource Detection
○ OTEL_RESOURCE environment variable
○ GCE metadata server
○ EC2 instance metadata server
● Routing
○ Route to particular exporter based on
incoming header value
TODO
● Span data sharding by TraceID
Collector Bundled Exporters
Traces
● File
○ JSON format
● Jaeger
○ v2 gRPC
● Kafka
○ OTLP, Jaeger, Zipkin
● Logging
○ Debugging
● OpenCensus
● OTLP (OpenTelemetry Protocol)
● Zipkin
○ v2 JSON or Protobuf
Metrics
● File
○ JSON format
● Logging
○ Debugging
● OpenCensus
● OTLP (OpenTelemetry Protocol)
● Prometheus
○ Metrics endpoint for Prometheus to pull
from
● Prometheus Remote Write
○ Pushes metrics in Prometheus
TimeSeries format (Cortex, etc.)
Collector Contrib Exporters
Traces
● AlibabaCloud LogService
● AWS X-Ray
● Azure Monitor
● Datadog
● Elastic
● Honeycomb
● Jaeger v1 Thrift
● AWS Kinesis (Jaeger proto)
● New Relic
● SignalFX APM
● Sentry
● Stackdriver
Metrics
● AlibabaCloud LogService
● AWS CloudWatch EMF
● Carbon
● Datadog
● Elastic
● New Relic
● SignalFX
● Splunk HEC
● Stackdriver
Vendor Hosted Exporters
Traces
● Dynatrace OneAgent
● Lightstep Launchers
Metrics
● Dynatrace OneAgent
● Lightstep Launchers
receivers:
otlp:
protocols:
grpc:
max_recv_msg_size_mib: 32
max_concurrent_streams: 16
read_buffer_size: 1024
write_buffer_size: 1024
keepalive:
server_parameters:
max_connection_idle: 10s
processors:
memory_limiter:
ballast_size_mib: 192
check_interval: 5s
limit_mib: 448
spike_limit_mib: 64
batch:
send_batch_size: 64
timeout: 15s
exporters:
jaeger:
endpoint: jaeger.monitoring.svc.storefront-development.local.:14250
timeout: 10s
sending_queue:
enabled: true
num_consumers: 2
queue_size: 10
retry_on_failure:
enabled: true
initial_interval: 10s
max_interval: 60s
max_elapsed_time: 10m
prometheusremotewrite:
namespace: "monitoring"
sending_queue:
enabled: true
num_consumers: 2
queue_size: 10
retry_on_failure:
enabled: true
initial_interval: 10s
max_interval: 60s
max_elapsed_time: 10m
endpoint: ":8888"
ca_file: "/etc/pki/tls/certs/carbon-lb.pem"
write_buffer_size: 524288
headers:
Prometheus-Remote-Write-Version: "0.1.0"
X-Scope-OrgID: 234
extensions:
health_check:
port: 13133
zpages:
endpoint: :55679
service:
extensions: [zpages, health_check]
pipelines:
traces:
receivers: [otlp]
processors: [memory_limiter, batch]
exporters: [jaeger]
metrics:
receivers: [otlp]
processors: [memory_limiter, batch]
exporters: [prometheusremotewrite]
Full Configuration File Example
Collector Command Line Example
/usr/local/bin/otelcol 
--config=/usr/local/etc/otel-collector-config.yaml 
--mem-ballast-size-mib=192 
--log-level=DEBUG
25
Collector Docker Images
● otel/opentelemetry-collector
○ Core receivers, processors, and exporters bundled in
● otel/opentelemetry-collector-contrib
○ All core and contrib receivers, processors, and exporters bundled in
● OpenTelemetry Collector builder
○ https://github.com/observatorium/opentelemetry-collector-builder
26
Other Collector Installs
● RPM
○ Produced by opentelemetry-collector build
● Debian
○ Produced by opentelemetry-collector build
27
Observing the Collector
● health_check
○ http://<hostname>:13133/ returns basic
pipeline availability
● zpages
○ RPC metric aggregations at
http://<hostname>:55679/debug/rpcz
○ Trace summaries at
http://<hostname>:55679/debug/tracez
● prometheus
○ Pipeline metrics scrap endpoint at
http://<hostname>:8888/metrics
28
Current Gotchas
● Errors propagated back through pipelines and instances in the chain
○ Errors reported by SDK exporters in the applications may be coming from two hops
downstream
● TraceID sharding not working correctly
○ Can only do tail-based sampling if running single instance of collector
29
Observability Platform Innovations
30
Latest Innovations
● Dynatrace automates manual quality validation processes using AI-
assisted SLI/SLO-based quality gates.
● New Relic Incident Intelligence continuously analyzes alerts and incident
data to find patterns in event sequences and offers suggested correlation
decisions that merge incidents to reduce alert noise further.
● Splunk SignalFX provides high cardinality exploration of traces across
different regions, hosts, versions or users.
● Lightstep provides rapid root cause analysis using unlimited cardinality
and a high-fidelity dataset uncompromised by head or tail sampling,
31
Latest Innovations
● Datadog provides automated tagging and correlation of logs so can jump
from any log entry to related metrics.
● Honeycomb lets you break down on every dimension in your data both
the obvious fields, and the surprising ones.
● Grafana Loki datasource provides switching from metrics to logs with
preserved label filters.
● Elastic Observability bring your logs, metrics, and APM traces together at
scale in a single stack.
32
Thank you!

More Related Content

What's hot

Prometheus - Intro, CNCF, TSDB,PromQL,Grafana
Prometheus - Intro, CNCF, TSDB,PromQL,GrafanaPrometheus - Intro, CNCF, TSDB,PromQL,Grafana
Prometheus - Intro, CNCF, TSDB,PromQL,Grafana
Sridhar Kumar N
 
OpenTelemetry: From front- to backend (2022)
OpenTelemetry: From front- to backend (2022)OpenTelemetry: From front- to backend (2022)
OpenTelemetry: From front- to backend (2022)
Sebastian Poxhofer
 
Understand your system like never before with OpenTelemetry, Grafana, and Pro...
Understand your system like never before with OpenTelemetry, Grafana, and Pro...Understand your system like never before with OpenTelemetry, Grafana, and Pro...
Understand your system like never before with OpenTelemetry, Grafana, and Pro...
LibbySchulze
 
KCD-OpenTelemetry.pdf
KCD-OpenTelemetry.pdfKCD-OpenTelemetry.pdf
KCD-OpenTelemetry.pdf
Rui Liu
 
Observability For You and Me with OpenTelemetry
Observability For You and Me with OpenTelemetryObservability For You and Me with OpenTelemetry
Observability For You and Me with OpenTelemetry
Eric D. Schabell
 
Observability, Distributed Tracing, and Open Source: The Missing Primer
Observability, Distributed Tracing, and Open Source: The Missing PrimerObservability, Distributed Tracing, and Open Source: The Missing Primer
Observability, Distributed Tracing, and Open Source: The Missing Primer
VMware Tanzu
 
Meetup OpenTelemetry Intro
Meetup OpenTelemetry IntroMeetup OpenTelemetry Intro
Meetup OpenTelemetry Intro
DimitrisFinas1
 
MeetUp Monitoring with Prometheus and Grafana (September 2018)
MeetUp Monitoring with Prometheus and Grafana (September 2018)MeetUp Monitoring with Prometheus and Grafana (September 2018)
MeetUp Monitoring with Prometheus and Grafana (September 2018)
Lucas Jellema
 
Infrastructure & System Monitoring using Prometheus
Infrastructure & System Monitoring using PrometheusInfrastructure & System Monitoring using Prometheus
Infrastructure & System Monitoring using Prometheus
Marco Pas
 
Introduction to Open Telemetry as Observability Library
Introduction to Open  Telemetry as Observability LibraryIntroduction to Open  Telemetry as Observability Library
Introduction to Open Telemetry as Observability Library
Tonny Adhi Sabastian
 
Intro to open source observability with grafana, prometheus, loki, and tempo(...
Intro to open source observability with grafana, prometheus, loki, and tempo(...Intro to open source observability with grafana, prometheus, loki, and tempo(...
Intro to open source observability with grafana, prometheus, loki, and tempo(...
LibbySchulze
 
Adopting OpenTelemetry
Adopting OpenTelemetryAdopting OpenTelemetry
Adopting OpenTelemetry
Vincent Behar
 
Monitoring with prometheus
Monitoring with prometheusMonitoring with prometheus
Monitoring with prometheus
Kasper Nissen
 
stackconf 2023 | Practical introduction to OpenTelemetry tracing by Nicolas F...
stackconf 2023 | Practical introduction to OpenTelemetry tracing by Nicolas F...stackconf 2023 | Practical introduction to OpenTelemetry tracing by Nicolas F...
stackconf 2023 | Practical introduction to OpenTelemetry tracing by Nicolas F...
NETWAYS
 
Adopting Open Telemetry as Distributed Tracer on your Microservices at Kubern...
Adopting Open Telemetry as Distributed Tracer on your Microservices at Kubern...Adopting Open Telemetry as Distributed Tracer on your Microservices at Kubern...
Adopting Open Telemetry as Distributed Tracer on your Microservices at Kubern...
Tonny Adhi Sabastian
 
Improve monitoring and observability for kubernetes with oss tools
Improve monitoring and observability for kubernetes with oss toolsImprove monitoring and observability for kubernetes with oss tools
Improve monitoring and observability for kubernetes with oss tools
Nilesh Gule
 
Observability
ObservabilityObservability
Observability
Ebru Cucen Çüçen
 
Introduction to Prometheus
Introduction to PrometheusIntroduction to Prometheus
Introduction to Prometheus
Julien Pivotto
 
Prometheus and Grafana
Prometheus and GrafanaPrometheus and Grafana
Prometheus and Grafana
Lhouceine OUHAMZA
 
Getting Started Monitoring with Prometheus and Grafana
Getting Started Monitoring with Prometheus and GrafanaGetting Started Monitoring with Prometheus and Grafana
Getting Started Monitoring with Prometheus and Grafana
Syah Dwi Prihatmoko
 

What's hot (20)

Prometheus - Intro, CNCF, TSDB,PromQL,Grafana
Prometheus - Intro, CNCF, TSDB,PromQL,GrafanaPrometheus - Intro, CNCF, TSDB,PromQL,Grafana
Prometheus - Intro, CNCF, TSDB,PromQL,Grafana
 
OpenTelemetry: From front- to backend (2022)
OpenTelemetry: From front- to backend (2022)OpenTelemetry: From front- to backend (2022)
OpenTelemetry: From front- to backend (2022)
 
Understand your system like never before with OpenTelemetry, Grafana, and Pro...
Understand your system like never before with OpenTelemetry, Grafana, and Pro...Understand your system like never before with OpenTelemetry, Grafana, and Pro...
Understand your system like never before with OpenTelemetry, Grafana, and Pro...
 
KCD-OpenTelemetry.pdf
KCD-OpenTelemetry.pdfKCD-OpenTelemetry.pdf
KCD-OpenTelemetry.pdf
 
Observability For You and Me with OpenTelemetry
Observability For You and Me with OpenTelemetryObservability For You and Me with OpenTelemetry
Observability For You and Me with OpenTelemetry
 
Observability, Distributed Tracing, and Open Source: The Missing Primer
Observability, Distributed Tracing, and Open Source: The Missing PrimerObservability, Distributed Tracing, and Open Source: The Missing Primer
Observability, Distributed Tracing, and Open Source: The Missing Primer
 
Meetup OpenTelemetry Intro
Meetup OpenTelemetry IntroMeetup OpenTelemetry Intro
Meetup OpenTelemetry Intro
 
MeetUp Monitoring with Prometheus and Grafana (September 2018)
MeetUp Monitoring with Prometheus and Grafana (September 2018)MeetUp Monitoring with Prometheus and Grafana (September 2018)
MeetUp Monitoring with Prometheus and Grafana (September 2018)
 
Infrastructure & System Monitoring using Prometheus
Infrastructure & System Monitoring using PrometheusInfrastructure & System Monitoring using Prometheus
Infrastructure & System Monitoring using Prometheus
 
Introduction to Open Telemetry as Observability Library
Introduction to Open  Telemetry as Observability LibraryIntroduction to Open  Telemetry as Observability Library
Introduction to Open Telemetry as Observability Library
 
Intro to open source observability with grafana, prometheus, loki, and tempo(...
Intro to open source observability with grafana, prometheus, loki, and tempo(...Intro to open source observability with grafana, prometheus, loki, and tempo(...
Intro to open source observability with grafana, prometheus, loki, and tempo(...
 
Adopting OpenTelemetry
Adopting OpenTelemetryAdopting OpenTelemetry
Adopting OpenTelemetry
 
Monitoring with prometheus
Monitoring with prometheusMonitoring with prometheus
Monitoring with prometheus
 
stackconf 2023 | Practical introduction to OpenTelemetry tracing by Nicolas F...
stackconf 2023 | Practical introduction to OpenTelemetry tracing by Nicolas F...stackconf 2023 | Practical introduction to OpenTelemetry tracing by Nicolas F...
stackconf 2023 | Practical introduction to OpenTelemetry tracing by Nicolas F...
 
Adopting Open Telemetry as Distributed Tracer on your Microservices at Kubern...
Adopting Open Telemetry as Distributed Tracer on your Microservices at Kubern...Adopting Open Telemetry as Distributed Tracer on your Microservices at Kubern...
Adopting Open Telemetry as Distributed Tracer on your Microservices at Kubern...
 
Improve monitoring and observability for kubernetes with oss tools
Improve monitoring and observability for kubernetes with oss toolsImprove monitoring and observability for kubernetes with oss tools
Improve monitoring and observability for kubernetes with oss tools
 
Observability
ObservabilityObservability
Observability
 
Introduction to Prometheus
Introduction to PrometheusIntroduction to Prometheus
Introduction to Prometheus
 
Prometheus and Grafana
Prometheus and GrafanaPrometheus and Grafana
Prometheus and Grafana
 
Getting Started Monitoring with Prometheus and Grafana
Getting Started Monitoring with Prometheus and GrafanaGetting Started Monitoring with Prometheus and Grafana
Getting Started Monitoring with Prometheus and Grafana
 

Similar to OpenTelemetry For Operators

[scala.by] Launching new application fast
[scala.by] Launching new application fast[scala.by] Launching new application fast
[scala.by] Launching new application fast
Denis Karpenko
 
Cinder On-boarding Room - Berlin (11-13-2018)
Cinder On-boarding Room - Berlin (11-13-2018)Cinder On-boarding Room - Berlin (11-13-2018)
Cinder On-boarding Room - Berlin (11-13-2018)
Jay Bryant
 
Prometheus: What is is, what is new, what is coming
Prometheus: What is is, what is new, what is comingPrometheus: What is is, what is new, what is coming
Prometheus: What is is, what is new, what is coming
Julien Pivotto
 
OpenStack Cinder On-Boarding Room - Vancouver Summit 2018
OpenStack Cinder On-Boarding Room - Vancouver Summit 2018OpenStack Cinder On-Boarding Room - Vancouver Summit 2018
OpenStack Cinder On-Boarding Room - Vancouver Summit 2018
Jay Bryant
 
Kubernetes Colorado - Kubernetes metrics deep dive 10/25/2017
Kubernetes Colorado - Kubernetes metrics deep dive 10/25/2017Kubernetes Colorado - Kubernetes metrics deep dive 10/25/2017
Kubernetes Colorado - Kubernetes metrics deep dive 10/25/2017
Bob Cotton
 
Exploring the power of OpenTelemetry on Kubernetes
Exploring the power of OpenTelemetry on KubernetesExploring the power of OpenTelemetry on Kubernetes
Exploring the power of OpenTelemetry on Kubernetes
Red Hat Developers
 
OpenStack Cinder On-Boarding Education - Boston Summit - 2017
OpenStack Cinder On-Boarding Education - Boston Summit - 2017OpenStack Cinder On-Boarding Education - Boston Summit - 2017
OpenStack Cinder On-Boarding Education - Boston Summit - 2017
Jay Bryant
 
Getting Started with Performance Co-Pilot
Getting Started with Performance Co-PilotGetting Started with Performance Co-Pilot
Getting Started with Performance Co-Pilot
Paul V. Novarese
 
Introducing KRaft: Kafka Without Zookeeper With Colin McCabe | Current 2022
Introducing KRaft: Kafka Without Zookeeper With Colin McCabe | Current 2022Introducing KRaft: Kafka Without Zookeeper With Colin McCabe | Current 2022
Introducing KRaft: Kafka Without Zookeeper With Colin McCabe | Current 2022
HostedbyConfluent
 
20180503 kube con eu kubernetes metrics deep dive
20180503 kube con eu   kubernetes metrics deep dive20180503 kube con eu   kubernetes metrics deep dive
20180503 kube con eu kubernetes metrics deep dive
Bob Cotton
 
Dev.bg DevOps March 2024 Monitoring & Logging
Dev.bg DevOps March 2024 Monitoring & LoggingDev.bg DevOps March 2024 Monitoring & Logging
Dev.bg DevOps March 2024 Monitoring & Logging
Marian Marinov
 
Integrating Puppet and Gitolite for sysadmins cooperations
Integrating Puppet and Gitolite for sysadmins cooperationsIntegrating Puppet and Gitolite for sysadmins cooperations
Integrating Puppet and Gitolite for sysadmins cooperations
Luca Mazzaferro
 
Kubernetes from scratch at veepee sysadmins days 2019
Kubernetes from scratch at veepee   sysadmins days 2019Kubernetes from scratch at veepee   sysadmins days 2019
Kubernetes from scratch at veepee sysadmins days 2019
🔧 Loïc BLOT
 
uReplicator: Uber Engineering’s Scalable, Robust Kafka Replicator
uReplicator: Uber Engineering’s Scalable,  Robust Kafka ReplicatoruReplicator: Uber Engineering’s Scalable,  Robust Kafka Replicator
uReplicator: Uber Engineering’s Scalable, Robust Kafka Replicator
Michael Hongliang Xu
 
Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...
Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...
Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...
DoKC
 
Cinder Project On-Boarding - OpenInfra Summit Denver 2019
Cinder Project On-Boarding - OpenInfra Summit Denver 2019Cinder Project On-Boarding - OpenInfra Summit Denver 2019
Cinder Project On-Boarding - OpenInfra Summit Denver 2019
Jay Bryant
 
A Kong retrospective: from 0.10 to 0.13
A Kong retrospective: from 0.10 to 0.13A Kong retrospective: from 0.10 to 0.13
A Kong retrospective: from 0.10 to 0.13
Thibault Charbonnier
 
BKK16-302: Android Optimizing Compiler: New Member Assimilation Guide
BKK16-302: Android Optimizing Compiler: New Member Assimilation GuideBKK16-302: Android Optimizing Compiler: New Member Assimilation Guide
BKK16-302: Android Optimizing Compiler: New Member Assimilation Guide
Linaro
 
LCE13: Test and Validation Summit: The future of testing at Linaro
LCE13: Test and Validation Summit: The future of testing at LinaroLCE13: Test and Validation Summit: The future of testing at Linaro
LCE13: Test and Validation Summit: The future of testing at Linaro
Linaro
 
LCE13: Test and Validation Mini-Summit: Review Current Linaro Engineering Pro...
LCE13: Test and Validation Mini-Summit: Review Current Linaro Engineering Pro...LCE13: Test and Validation Mini-Summit: Review Current Linaro Engineering Pro...
LCE13: Test and Validation Mini-Summit: Review Current Linaro Engineering Pro...
Linaro
 

Similar to OpenTelemetry For Operators (20)

[scala.by] Launching new application fast
[scala.by] Launching new application fast[scala.by] Launching new application fast
[scala.by] Launching new application fast
 
Cinder On-boarding Room - Berlin (11-13-2018)
Cinder On-boarding Room - Berlin (11-13-2018)Cinder On-boarding Room - Berlin (11-13-2018)
Cinder On-boarding Room - Berlin (11-13-2018)
 
Prometheus: What is is, what is new, what is coming
Prometheus: What is is, what is new, what is comingPrometheus: What is is, what is new, what is coming
Prometheus: What is is, what is new, what is coming
 
OpenStack Cinder On-Boarding Room - Vancouver Summit 2018
OpenStack Cinder On-Boarding Room - Vancouver Summit 2018OpenStack Cinder On-Boarding Room - Vancouver Summit 2018
OpenStack Cinder On-Boarding Room - Vancouver Summit 2018
 
Kubernetes Colorado - Kubernetes metrics deep dive 10/25/2017
Kubernetes Colorado - Kubernetes metrics deep dive 10/25/2017Kubernetes Colorado - Kubernetes metrics deep dive 10/25/2017
Kubernetes Colorado - Kubernetes metrics deep dive 10/25/2017
 
Exploring the power of OpenTelemetry on Kubernetes
Exploring the power of OpenTelemetry on KubernetesExploring the power of OpenTelemetry on Kubernetes
Exploring the power of OpenTelemetry on Kubernetes
 
OpenStack Cinder On-Boarding Education - Boston Summit - 2017
OpenStack Cinder On-Boarding Education - Boston Summit - 2017OpenStack Cinder On-Boarding Education - Boston Summit - 2017
OpenStack Cinder On-Boarding Education - Boston Summit - 2017
 
Getting Started with Performance Co-Pilot
Getting Started with Performance Co-PilotGetting Started with Performance Co-Pilot
Getting Started with Performance Co-Pilot
 
Introducing KRaft: Kafka Without Zookeeper With Colin McCabe | Current 2022
Introducing KRaft: Kafka Without Zookeeper With Colin McCabe | Current 2022Introducing KRaft: Kafka Without Zookeeper With Colin McCabe | Current 2022
Introducing KRaft: Kafka Without Zookeeper With Colin McCabe | Current 2022
 
20180503 kube con eu kubernetes metrics deep dive
20180503 kube con eu   kubernetes metrics deep dive20180503 kube con eu   kubernetes metrics deep dive
20180503 kube con eu kubernetes metrics deep dive
 
Dev.bg DevOps March 2024 Monitoring & Logging
Dev.bg DevOps March 2024 Monitoring & LoggingDev.bg DevOps March 2024 Monitoring & Logging
Dev.bg DevOps March 2024 Monitoring & Logging
 
Integrating Puppet and Gitolite for sysadmins cooperations
Integrating Puppet and Gitolite for sysadmins cooperationsIntegrating Puppet and Gitolite for sysadmins cooperations
Integrating Puppet and Gitolite for sysadmins cooperations
 
Kubernetes from scratch at veepee sysadmins days 2019
Kubernetes from scratch at veepee   sysadmins days 2019Kubernetes from scratch at veepee   sysadmins days 2019
Kubernetes from scratch at veepee sysadmins days 2019
 
uReplicator: Uber Engineering’s Scalable, Robust Kafka Replicator
uReplicator: Uber Engineering’s Scalable,  Robust Kafka ReplicatoruReplicator: Uber Engineering’s Scalable,  Robust Kafka Replicator
uReplicator: Uber Engineering’s Scalable, Robust Kafka Replicator
 
Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...
Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...
Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...
 
Cinder Project On-Boarding - OpenInfra Summit Denver 2019
Cinder Project On-Boarding - OpenInfra Summit Denver 2019Cinder Project On-Boarding - OpenInfra Summit Denver 2019
Cinder Project On-Boarding - OpenInfra Summit Denver 2019
 
A Kong retrospective: from 0.10 to 0.13
A Kong retrospective: from 0.10 to 0.13A Kong retrospective: from 0.10 to 0.13
A Kong retrospective: from 0.10 to 0.13
 
BKK16-302: Android Optimizing Compiler: New Member Assimilation Guide
BKK16-302: Android Optimizing Compiler: New Member Assimilation GuideBKK16-302: Android Optimizing Compiler: New Member Assimilation Guide
BKK16-302: Android Optimizing Compiler: New Member Assimilation Guide
 
LCE13: Test and Validation Summit: The future of testing at Linaro
LCE13: Test and Validation Summit: The future of testing at LinaroLCE13: Test and Validation Summit: The future of testing at Linaro
LCE13: Test and Validation Summit: The future of testing at Linaro
 
LCE13: Test and Validation Mini-Summit: Review Current Linaro Engineering Pro...
LCE13: Test and Validation Mini-Summit: Review Current Linaro Engineering Pro...LCE13: Test and Validation Mini-Summit: Review Current Linaro Engineering Pro...
LCE13: Test and Validation Mini-Summit: Review Current Linaro Engineering Pro...
 

Recently uploaded

Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Globus
 
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdf
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdfEnhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdf
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdf
Jay Das
 
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Shahin Sheidaei
 
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Globus
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
NYGGS Automation Suite
 
Graphic Design Crash Course for beginners
Graphic Design Crash Course for beginnersGraphic Design Crash Course for beginners
Graphic Design Crash Course for beginners
e20449
 
A Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of PassageA Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of Passage
Philip Schwarz
 
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Anthony Dahanne
 
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdfDominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
AMB-Review
 
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns
 
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke
 
2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx
Georgi Kodinov
 
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
Tier1 app
 
Using IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New ZealandUsing IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New Zealand
IES VE
 
How Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptxHow Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptx
wottaspaceseo
 
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Mind IT Systems
 
GlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote sessionGlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote session
Globus
 
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Globus
 
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar
 
Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"
Donna Lenk
 

Recently uploaded (20)

Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
 
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdf
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdfEnhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdf
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdf
 
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
 
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
 
Graphic Design Crash Course for beginners
Graphic Design Crash Course for beginnersGraphic Design Crash Course for beginners
Graphic Design Crash Course for beginners
 
A Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of PassageA Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of Passage
 
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
 
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdfDominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
 
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology Solutions
 
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
 
2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx
 
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
 
Using IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New ZealandUsing IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New Zealand
 
How Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptxHow Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptx
 
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
 
GlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote sessionGlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote session
 
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
 
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBroker
 
Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"
 

OpenTelemetry For Operators

  • 1. OpenTelemetry For Operators Presented by Kevin Brockhoff Apache 2.0 Licensed
  • 2. Our Agenda ● Why are current observability platforms falling short? ● What OpenTelemetry features address these issues? ● How do I run OpenTelemetry components in production? ● Who are the innovators in the observability space?
  • 3. Level Setting ● Have you used ELK stack or other log aggregator? ● Have you used an APM system? ● Have you used distributed tracing before?
  • 4. Who am I? ● Kevin Brockhoff - Senior Consultant, Daugherty Business Solutions ○ Solving difficult cloud adoption challenges for Daugherty's Fortune 500 clients ○ OpenTelemetry committer since early stages of the project ○ Github: https://github.com/kbrockhoff ○ Linkedin: https://www.linkedin.com/in/kevi n-brockhoff-a557877/
  • 6. 6 Enterprise Applications ● Only instrumented with logging during initial development. ○ Logging oriented toward development, not operations ● Metrics and tracing only added later if at all as a separate project. ○ Each team creates their own system using familiar tools ○ Or enterprise commits to a specific APM vendor ● Logs, metrics and traces are never connected.
  • 7. 7 First Generation Observability Platforms Search logs in ELK, Lack context Homegrown tracing per app mainly accessible by developers Customer experience metrics Low-level metrics and alerts
  • 9. OpenCensus + OpenTracing = OpenTelemetry ● OpenTracing: ○ Provides APIs and instrumentation for distributed tracing ● OpenCensus: ○ Provides APIs and instrumentation that allow you to collect application metrics and distributed tracing. ● OpenTelemetry: ○ An effort to combine distributed tracing, metrics and logging into a single set of system components and language-specific libraries.
  • 10. 10 OpenTelemetry Project ● Specification ○ API (for application developers) ○ SDK Implementations ○ Transport Protocol (Protobuf) ● Collector (middleware) ● SDK’s (various stages of maturity) ○ C++ ○ C# (Auto-instrument/Manual) ○ Erlang ○ Go ○ JavaScript (Browser/Node) ○ Java (Auto-instrument/Manual) ■ Android compatibility ○ PHP ○ Python (Auto-instrument/Manual) ○ Ruby ○ Rust ○ Swift
  • 13. 13 OpenTelemetry Collector ● Offers a vendor-agnostic implementation on how to receive, process and export telemetry data. ● Removes the need to run, operate and maintain multiple agents/collectors. ● Support open-source telemetry data formats (e.g. OTLP, Jaeger, Prometheus, etc.) sending to multiple open-source or commercial back- ends.
  • 14. 14 Collector Concepts ● Telemetry data processing pipelines ○ Per pipeline: Receiver(s) -> Processors -> Exporter(s) ○ Currently only single telemetry type pipelines supported ● Extensions ○ Supporting functionality ○ Core collector extensions ■ health_check - HTTP endpoint for load balancer or k8s controller ■ zpages - Internal processing metrics and traces accessible via HTTP ■ pprof - Performance profiler enables the golang net/http/pprof endpoint
  • 15. Collector Bundled Receivers Traces ● Jaeger ○ Compact Thrift, Binary Thrift, HTTP, gRPC ○ Sampling strategy configuration server ● Kafka ○ OTLP, Jaeger, Zipkin data structures ● OpenCensus ● OTLP (OpenTelemetry Protocol) ○ gRPC, HTTP ● Zipkin ○ v1, v1 Thrift, v2, v2 Protobuf Metrics ● Host metrics scrapper ○ cpu, disk, load, filesystem, memory, network, processes, swap, process ● Kafka ○ OTLP ● OpenCensus ● OTLP (OpenTelemetry Protocol) ○ gRPC, HTTP ● Prometheus ○ Full discovery and polling capabilities Logs ● Fluent Forward ○ Spec compliant except no mTLS
  • 16. Collector Contrib Receivers Traces ● AWS X-Ray ● SignalFX APM v1 Metrics ● AWS ECS Container ● Carbon ● CollectD (JSON only) ● Docker Stats ● Kubernetes Cluster ● Kubernetes Kubelet ● Prometheus Exporters ● Redis INFO ● SignalFX ● Splunk HEC ● StatsD ● Wavefront Logs ● SignalFX (Events) ● Stanza
  • 17. Collector Bundled Processors ● Attributes ○ Modifies span attributes ● Batch ○ Groups data into batches ● Filter ○ Include/exclude metrics by name ● Group by Trace ○ Holds all spans for a trace for a set time and then sends to next processor ● Memory Limiter ○ Prevents out-of-memory issues by triggering GC ○ Configuration must be matched with ballast setting collector is launched with ● Queued Retry ○ Deprecated, each exporter now implements ● Resource ○ Applies changes to Resource attributes ● Probabilistic Sampling ○ Adjusts TraceID hash-based sampling decisions by sampling.priority attribute value ● Tail Sampling ○ Sampling decisions based on configured attribute values and rate limits ● Span ○ Modifies span name or attributes based on span name
  • 18. 18 Recommended Processor Configuration Traces memory_limiter any sampling processors batch any other processors Metrics memory_limiter any filtering processors batch any other processors Memory limiter ballast_size_mib must match --mem-ballast-size-mib command line parameter. Trigger GC with either limit_mib / spike_limit_mib or limit_percentage / spike_limit_percentage.
  • 19. Collector Contrib Processors ● Kubernetes ○ Adds metadata from pod ● Metrics Transform ○ Renames/aggregations within individual metrics ● Resource Detection ○ OTEL_RESOURCE environment variable ○ GCE metadata server ○ EC2 instance metadata server ● Routing ○ Route to particular exporter based on incoming header value TODO ● Span data sharding by TraceID
  • 20. Collector Bundled Exporters Traces ● File ○ JSON format ● Jaeger ○ v2 gRPC ● Kafka ○ OTLP, Jaeger, Zipkin ● Logging ○ Debugging ● OpenCensus ● OTLP (OpenTelemetry Protocol) ● Zipkin ○ v2 JSON or Protobuf Metrics ● File ○ JSON format ● Logging ○ Debugging ● OpenCensus ● OTLP (OpenTelemetry Protocol) ● Prometheus ○ Metrics endpoint for Prometheus to pull from ● Prometheus Remote Write ○ Pushes metrics in Prometheus TimeSeries format (Cortex, etc.)
  • 21. Collector Contrib Exporters Traces ● AlibabaCloud LogService ● AWS X-Ray ● Azure Monitor ● Datadog ● Elastic ● Honeycomb ● Jaeger v1 Thrift ● AWS Kinesis (Jaeger proto) ● New Relic ● SignalFX APM ● Sentry ● Stackdriver Metrics ● AlibabaCloud LogService ● AWS CloudWatch EMF ● Carbon ● Datadog ● Elastic ● New Relic ● SignalFX ● Splunk HEC ● Stackdriver
  • 22. Vendor Hosted Exporters Traces ● Dynatrace OneAgent ● Lightstep Launchers Metrics ● Dynatrace OneAgent ● Lightstep Launchers
  • 23. receivers: otlp: protocols: grpc: max_recv_msg_size_mib: 32 max_concurrent_streams: 16 read_buffer_size: 1024 write_buffer_size: 1024 keepalive: server_parameters: max_connection_idle: 10s processors: memory_limiter: ballast_size_mib: 192 check_interval: 5s limit_mib: 448 spike_limit_mib: 64 batch: send_batch_size: 64 timeout: 15s exporters: jaeger: endpoint: jaeger.monitoring.svc.storefront-development.local.:14250 timeout: 10s sending_queue: enabled: true num_consumers: 2 queue_size: 10 retry_on_failure: enabled: true initial_interval: 10s max_interval: 60s max_elapsed_time: 10m prometheusremotewrite: namespace: "monitoring" sending_queue: enabled: true num_consumers: 2 queue_size: 10 retry_on_failure: enabled: true initial_interval: 10s max_interval: 60s max_elapsed_time: 10m endpoint: ":8888" ca_file: "/etc/pki/tls/certs/carbon-lb.pem" write_buffer_size: 524288 headers: Prometheus-Remote-Write-Version: "0.1.0" X-Scope-OrgID: 234 extensions: health_check: port: 13133 zpages: endpoint: :55679 service: extensions: [zpages, health_check] pipelines: traces: receivers: [otlp] processors: [memory_limiter, batch] exporters: [jaeger] metrics: receivers: [otlp] processors: [memory_limiter, batch] exporters: [prometheusremotewrite] Full Configuration File Example
  • 24. Collector Command Line Example /usr/local/bin/otelcol --config=/usr/local/etc/otel-collector-config.yaml --mem-ballast-size-mib=192 --log-level=DEBUG
  • 25. 25 Collector Docker Images ● otel/opentelemetry-collector ○ Core receivers, processors, and exporters bundled in ● otel/opentelemetry-collector-contrib ○ All core and contrib receivers, processors, and exporters bundled in ● OpenTelemetry Collector builder ○ https://github.com/observatorium/opentelemetry-collector-builder
  • 26. 26 Other Collector Installs ● RPM ○ Produced by opentelemetry-collector build ● Debian ○ Produced by opentelemetry-collector build
  • 27. 27 Observing the Collector ● health_check ○ http://<hostname>:13133/ returns basic pipeline availability ● zpages ○ RPC metric aggregations at http://<hostname>:55679/debug/rpcz ○ Trace summaries at http://<hostname>:55679/debug/tracez ● prometheus ○ Pipeline metrics scrap endpoint at http://<hostname>:8888/metrics
  • 28. 28 Current Gotchas ● Errors propagated back through pipelines and instances in the chain ○ Errors reported by SDK exporters in the applications may be coming from two hops downstream ● TraceID sharding not working correctly ○ Can only do tail-based sampling if running single instance of collector
  • 30. 30 Latest Innovations ● Dynatrace automates manual quality validation processes using AI- assisted SLI/SLO-based quality gates. ● New Relic Incident Intelligence continuously analyzes alerts and incident data to find patterns in event sequences and offers suggested correlation decisions that merge incidents to reduce alert noise further. ● Splunk SignalFX provides high cardinality exploration of traces across different regions, hosts, versions or users. ● Lightstep provides rapid root cause analysis using unlimited cardinality and a high-fidelity dataset uncompromised by head or tail sampling,
  • 31. 31 Latest Innovations ● Datadog provides automated tagging and correlation of logs so can jump from any log entry to related metrics. ● Honeycomb lets you break down on every dimension in your data both the obvious fields, and the surprising ones. ● Grafana Loki datasource provides switching from metrics to logs with preserved label filters. ● Elastic Observability bring your logs, metrics, and APM traces together at scale in a single stack.

Editor's Notes

  1. Copyright 2020, The OpenTelemetry Authors Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.