Abstract
As part of the team delivering Snap, an open telemetry framework, I've run through dozens of use cases where gathering disparate metrics from services can roll up into meaningful diagrams for operations engineers and developers alike. We will use Snap's plugin model to collect, process and publish these measurements into meaningful graphs using open source tools. By joining this session, you can follow along and install industry-standard open source projects, deploy them and then use Snap to collect, process and visualize these metrics.
Audience
Anyone with an operations-background (or future ahead of them) that wants to see the breadth of available open source tooling around telemetry. This proposal is designed for the hands-on user, who is comfortable running containers or virtual machines locally.
Experience Level
Intermediate
Benefits to the Ecosystem
By joining this session, you can follow along and install industry-standard open source projects, deploy them and then use Snap to collect, process and visualize these metrics. This empowers users within the Linux ecosystem to see their knowledge as powerful when visualized next to other layers of the datacenter.
2. Briefly, About Me
Am:
@mjbrender (everywhere)
Developer Advocate,
Orchestration Engineering
Pretty good at Open Source
practices
Was:
Storage array performance
VMware
NoSQL
3. Loose Agenda
1. Wishful thinking of the lab config
2. What is telemetry
3. One opinion on the state of open source tooling
4. Let’s Test the Network
4
linuxcon.snap-telemetry.io
then
git clone
I encourage you to keep downloading stuff until you’re ready to go.
13. 13
what my friends think telemetry is what my parents think telemetry is
what society thinks telemetry is
what my boss thinks telemetry is what I think telemetry is what telemetry actually is
14. What Is Telemetry?
Telemetry is the stuff you can measure and the process of capturing it: from the heat
generated on a CPU core to the throughput of Nginx* running in a Docker* container on a
Kubernetes cluster. It’s all measurable and it’s all summarized in that one word.
• Telemetry - the process of using equipment to take measurements of something and
send them to another place
• Metrics - measurements of facts throughout the data center
• Analytics - the method of logical analysis that determines the consequences of
information
15. What Is Telemetry?
What How
Application Availability ping
Operating System
Performance
psutil
Hardware Utilization
Intel Performance
Counter Metrics (PCM)
16. What Is Telemetry?
What How Why
Application Availability ping SLA compliance
Operating System
Performance
psutil System performance
Hardware Utilization
Intel Performance
Counter Metrics (PCM)
Scaling capacity
17. What snap is and what it isn’t
17
Telemetry Analytics
18. What snap is and what it isn’t
18
Telemetry Analytics
snap
snap is a framework
for metrics.
snap is NOT an
analytics alternative.
19. What snap is and what it isn’t
19
Telemetry Analytics
Automation
Scheduling
IRO
24. 24
Collectors in snap
Collect telemetry data once via plugins for:
§ Bare metal, including Intel specific platform metrics
(CPU, NIC, BMC, SMARTS)
§ Operating Environments and existing telemetry
(Docker, libvirt, psutil)
§ Application services and adjacencies
(Ceph, HAProxy, Etcd, Facter, MySQL, Apache)
Populate a dynamically generated single-namespace
telemetry catalog
25. 25
Filter, alter or append metadata via plugins for:
§ Filtering (Moving Averages)
§ Normalization
§ Encryption for all or part of the data set
§ Injection of metadata
§ Tokens
§ Tenant IDs
Forking to one or more endpoints
Processors in snap
26. 26
Publish data via plugins for:
§ Dashboard Tools
(Graphite, Grafana, Riemann)
§ Queues and Logs
(RabbitMQ, Kafka, File)
§ Databases
(PostgreSQL, InfluxDB, OpenTSDB, SAP HANA)
To one or more endpoints
Publishers in snap
27. Visibility at all layers
27
App
OS
HW
?
?
?
?
Analytics Pipeline
Dashboards
28. Visibility at all layers
28
?
App
OS
HW
Analytics Pipeline
Dashboards
29. Visibility at all layers
29
Snap
App
OS
HW
Analytics Pipeline
Dashboards
30. Visibility at all layers
30
OS
HW
Analytics Pipeline
Dashboards
App
OS
Virtualization
HW
App
Snap
31. Visibility at all layers
31
OS
HW
Analytics Pipeline
Dashboards
App
OS
Virtualization
HW
App
Snap
32. Visibility at all layers
32
OS
HW
Analytics Pipeline
Dashboards
App
OS
HW
App
Snap
Kubernetes
33. Visibility at all layers
OS
HW
App
Snap
Kubernetes
OS
HW
App
OS
HW
App
OS
HW
App
OS
HW
App
OS
HW
App
OS
HW
App
52. What Is Telemetry? (revisited)
What How
Application Availability ping
Operating System
Performance
psutil
Hardware Utilization
Intel Performance
Counter Metrics (PCM)
53. What Is Telemetry? (revisited)
What Query Collect Process Publish Visualize
Application Availability ping ? ? ? ?
Operating System
Performance
psutil ? ? ? ?
Hardware Utilization PCM ? ? ? ?
How Expanded
54. What Is Telemetry? (revisited)
What Query Collect Process Publish Visualize
Application Availability ping ? ? ? ?
Operating System
Performance
psutil ? ? ? ?
Hardware Utilization PCM ? ? ? ?
How Expanded
Snap Grafana
65. § Plugin load
§ Dynamic, does not require restart
§ Automatically is informed by plugin on the features, metrics, and configuration detail.
§ Dynamically extends the metric catalog when loaded.
§ Plugin unload
§ Removes metrics from catalog automatically
§ Loading a new plugin automatically upgrades running workflows in tasks
§ Optionally the collection can be pinned to a version
(ex: get /intel/server/cpu/load/v1)
§ Each scheduled workflow automatically uses the most mature plugin for that step
§ Coupled with dynamic plugin loading results in instantaneous updates to existing workflows
§ Helpful for bug fixes, security patching, improving accuracy
snap | Plugin Lifecycle
65
66. Customizable definition of task and related workflow:
Collect
Publish
Publish
Collect Publish ProcessCollect Publish
Collect
Process Publish
Process Publish
snap | Overview – Example Workflows
66