Intro to open source telemetry linux con 2016

LinuxCon 2016
An introduction to datacenter
telemetry using open source tools
Matt Brender (@mjbrender)

Briefly, About Me
Am:
@mjbrender (everywhere)
Developer Advocate,
Orchestration Engineering
Pretty good at Open Source
practices
Was:
Storage array performance
VMware
NoSQL

Loose Agenda
1. Wishful thinking of the lab config
2. What is telemetry
3. One opinion on the state of open source tooling

Let’s Test the Network
4
linuxcon.snap-telemetry.io
then
git clone
I encourage you to keep downloading stuff until you’re ready to go.

High Level View
6
Grafana
+
InfluxDB
Snap Snap
“Admin” ”Production”

Less High Level View
7
Your Laptop
Ubuntu 16.04
Vagrant
Ubuntu 16.04Ubuntu 16.04

8
Your Laptop
Ubuntu 16.04
Vagrant
Ansible
SnapDocker Snap

9
Your Laptop
Ubuntu 16.04
Vagrant
Ansible
SnapDocker Snap
Compose
InfluxDB Grafana

12
Snap
collectd
StatsD
telegraf
beats
Logstash
diamond
InfluxDB
OpenTSDB
KairosDB
Graphite
Prometheus
ElasticSearch
Bosun
Grafana
Sensu
Ganglia
RRDtool
Nagios
Facette
Vector (Netflix)

13
what my friends think telemetry is what my parents think telemetry is
what society thinks telemetry is
what my boss thinks telemetry is what I think telemetry is what telemetry actually is

What Is Telemetry?
Telemetry is the stuff you can measure and the process of capturing it: from the heat
generated on a CPU core to the throughput of Nginx* running in a Docker* container on a
Kubernetes cluster. It’s all measurable and it’s all summarized in that one word.
• Telemetry - the process of using equipment to take measurements of something and
send them to another place
• Metrics - measurements of facts throughout the data center
• Analytics - the method of logical analysis that determines the consequences of
information

What Is Telemetry?
What How
Application Availability ping
Operating System
Performance
psutil
Hardware Utilization
Intel Performance
Counter Metrics (PCM)

What Is Telemetry?
What How Why
Application Availability ping SLA compliance
Operating System
Performance
psutil System performance
Intel Performance
Scaling capacity

What snap is and what it isn’t
17
Telemetry Analytics

18
Telemetry Analytics
snap
snap is a framework
for metrics.
snap is NOT an
analytics alternative.

19
Telemetry Analytics
Automation
Scheduling
IRO

collect process publish
The Watcher Workflow
20

24
Collectors in snap
Collect telemetry data once via plugins for:
§ Bare metal, including Intel specific platform metrics
(CPU, NIC, BMC, SMARTS)
§ Operating Environments and existing telemetry
(Docker, libvirt, psutil)
§ Application services and adjacencies
(Ceph, HAProxy, Etcd, Facter, MySQL, Apache)
Populate a dynamically generated single-namespace
telemetry catalog

25
Filter, alter or append metadata via plugins for:
§ Filtering (Moving Averages)
§ Normalization
§ Encryption for all or part of the data set
§ Injection of metadata
§ Tokens
§ Tenant IDs
Forking to one or more endpoints
Processors in snap

26
Publish data via plugins for:
§ Dashboard Tools
(Graphite, Grafana, Riemann)
§ Queues and Logs
(RabbitMQ, Kafka, File)
§ Databases
(PostgreSQL, InfluxDB, OpenTSDB, SAP HANA)
To one or more endpoints
Publishers in snap

Visibility at all layers
27
App
OS
HW
?
?
?
?
Analytics Pipeline
Dashboards

28
?
App
OS
HW
Analytics Pipeline
Dashboards

29
Snap
App
OS
HW
Analytics Pipeline
Dashboards

30
OS
HW
Analytics Pipeline
Dashboards
App
OS
Virtualization
HW
App
Snap

31
OS
HW
Analytics Pipeline
Dashboards
App
OS
Virtualization
HW
App
Snap

32
OS
HW
Analytics Pipeline
Dashboards
App
OS
HW
App
Snap
Kubernetes

OS
HW
App
Snap
Kubernetes
OS
HW
App
OS
HW
App
OS
HW
App
OS
HW
App
OS
HW
App
OS
HW
App

34
REST & CLI
Flexible
Scheduling
Caching Security
Plugin Lifecycle
Management
Worker Queues Metric Catalog Tribe

Thought Leadership Ahead
35
Warning:

37
Monitoring
Telemetry
Alerts
Persistence
Learning
Visualization
Logging
Notifications
Monitoring is

39
Monitoring is
Telemetry
Collect
Process
Publish
Schedule
Automate

40
Monitoring
Telemetry
Alerts
Persistence
Learning
Visualization
Logging
Notifications
Monitoring is

41
Monitoring
Telemetry
Alerts
Persistence
Learning
Visualization
Logging
Notifications
Monitoring is
Snap

42
Monitoring
Telemetry
Alerts
Persistence
Learning
Visualization
Logging
Notifications
Monitoring is
Grafana

Better Thought Leadership
43
by @obscurify by @caskey
https://github.com/mjbrender/what-we-talk-about-when-we-talk-about-telemetry

FAQ
46
I don’t need telemetry, I have
____________.

FAQ
47
I don’t need telemetry, I have
____________.Graphite

48
Monitoring
Telemetry
Alerts
Persistence
Learning
Visualization
Logging
Notifications
Monitoring is
Graphite

FAQ
50
We run ________ for monitoring.Nagios

51
Monitoring
Telemetry
Alerts
Persistence
Learning
Visualization
Logging
Notifications
Monitoring is
Nagios

What Is Telemetry? (revisited)
What How
Application Availability ping
Operating System
Performance
psutil
Intel Performance

What Query Collect Process Publish Visualize
Application Availability ping ? ? ? ?
Operating System
Performance
psutil ? ? ? ?
Hardware Utilization PCM ? ? ? ?
How Expanded

What Query Collect Process Publish Visualize
Application Availability ping ? ? ? ?
Operating System
Performance
psutil ? ? ? ?
Hardware Utilization PCM ? ? ? ?
How Expanded
Snap Grafana

Next Up
56
Start using Snap!
• snap-telemetry.io
• github.com/intelsdi-x
Find me:
• on The Geek Whisperers
• and @mjbrender

Everything is Challenging At Scale
58

define as a tribe
Scaling with Tribe
61

Scaling with Tribe
Add new task
62

snap | What’s next?
Physical/Virtual Host
Scheduler
Processing
Publishing
Collection
63

snap | What’s next?
64
Physical/VM Host
Physical/VM Host
Physical/VM Host
Physical/VM Host
Physical/VM Host Physical/VM Host
Collection
Collection
Collection
Scheduler
Processing Publishing

§ Plugin load
§ Dynamic, does not require restart
§ Automatically is informed by plugin on the features, metrics, and configuration detail.
§ Dynamically extends the metric catalog when loaded.
§ Plugin unload
§ Removes metrics from catalog automatically
§ Loading a new plugin automatically upgrades running workflows in tasks
§ Optionally the collection can be pinned to a version
(ex: get /intel/server/cpu/load/v1)
§ Each scheduled workflow automatically uses the most mature plugin for that step
§ Coupled with dynamic plugin loading results in instantaneous updates to existing workflows
§ Helpful for bug fixes, security patching, improving accuracy
snap | Plugin Lifecycle
65

Customizable definition of task and related workflow:
Collect
Publish
Publish
Collect Publish ProcessCollect Publish
Collect
Process Publish
Process Publish
snap | Overview – Example Workflows
66

The Catalog
67
Intel
PCM
psutil HAProxy
/intel/psutil/load/load1
/intel/psutil/load/load5
/intel/psutil/vm/available
/intel/pcm/EXEC
/intel/pcm/FREQ
/intel/linux/docker/cpu_stats/throttling_data/periods
snapctl metric list
/intel/server/health/score
Docker
Intel
Health
/intel/haproxy/info/MaxConnRate
snap

Intro to open source telemetry linux con 2016

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Intro to open source telemetry linux con 2016

Similar to Intro to open source telemetry linux con 2016 (20)

More from Matthew Broberg

More from Matthew Broberg (7)

Recently uploaded

Recently uploaded (20)

Intro to open source telemetry linux con 2016