Complex and simple way to write influxdb

Complex and simple ways to write to InfluxDB
1
Krystof Borkovec
IT-CM-IS

Structure of the talk
1) Use case
2) What I have tried
3) Current solution
2

1) Use case
• Grafana dashboard for HTCondor users
• Shows resource usage stats from cgroups
• CPU, memory, io
• Live, interval ~10s
• Simplify advanced job debugging and optimization
• Searchable by job ID (HTCondor global job ID)
• ~100K job slots (i.e. cgroups)
• InfluxDB schema:
• Ex. measurement name: cpu
• Ex. tags: global_job_id, host, slot, user
• Ex. values: avg_system, avg_user,…
• don’t know yet if InfluxDB can take it
3

2) What I have tried - technologies
• Others suggested using CAdvisor and collectd.
• There was a partial solution based on that which
didn’t allow for searching by Job ID.
• That means the following pipeline:
Cgroups > CAdvisor > cadvisor-collectd plugin >
collectd > collectd WriteGraphite plugin > InfluxDB
Graphite input templates > InfluxDB > Grafana
No official collectd plugin for InfluxDB, so we
pretend to send it to Graphite.
4

2) What I have tried - Graphite
Cadvisor-collectd plugin has to emit data in collectd
format – i.e. each record identified by:
(host, plugin, plugin_instance, type, type_instance)
(myhost.cern.ch, cpu, 2, cpu, idle)
Collectd WriteGraphite plugin will use it to construct
records of the following form:
myhost_cern_ch.cpu-2.cpu-idle 98.6103 1329168255
5

2) What I have tried - Graphite
• If you send a metric named
myhost_cern_ch.cpu-2.cpu-idle, InfluxDB will store full
metric name as measurement with no extracted tags.
• To extract tags, you can define templates in InfluxDB
configuration:
• With template: measurement.cpu_number.cpu_stat
• InfluxDB will store the record in measurement named
myhost_cern_ch with tags cpu_number=cpu-2 and
cpu_stat=cpu-idle
• It pushes you to misuse the structure of collectd record to get the
desired schema.
• Limited flexibility, fixed set of templates, no programmatic control.
• Templates are on the Influx server – difficult to debug and change!
6

2) What I have tried - enrichment
On top of problems with schema using WriteGraphite,
I had problems with the enrichment i.e. adding Job ID to the
cgroups data (output of condor_who command):
• Could I add Job ID in cadvisor-collectd plugin?
• Nope, it is HTCondor-specific, shouldn’t be there
• Could I add Job ID in collectd via multiple plugins?
• Cadvisor-collectd > WriteCSV
• Exec-plugin
• run condor_who command to get Job Slot – Job ID mapping
• Read CSV file written by cadvisor-collectd plugin
• Mix cgroups stats together with JobSlot – Job ID mapping
• Nope, that is hackish, there is no elegant way to enrich data in
collectd.
• Could I add Job ID in Flume Morphline?
7

2) What I have tried - Flume
Because:
• people were talking about using Flume
• I had problems with enriching data in collectd
• I had problems with getting the correct schema with Graphite plugin
I tried to add Flume into the pipeline to enrich the data and get
more flexibility with respect to the schema.
Cgroups > CAdvisor > cadvisor-collectd plugin >
collectd > collectd chains > collectd WriteHTTP
plugin > Flume agent > Flume morphlines > Flume
Write HTTP Sink > InfluxDB > Grafana
8

2) What I have tried - Flume
So I tried to understand how these technologies fit together:
• Cgroups
• Cadvisor
• Collectd
• Cadvisor plugin
• WriteHTTP plugin
• Collectd chains
• Flume
• Flume morphlines
• HMRC Flume Write HTTP Sink
• InfluxDB
• HTTP API
• Grafana
• Related Puppet modules
9

3) Current solution - cgs
The resulting pipeline with Flume:
• adds another 900 LOC in Java, 3 threads
• works, but insanely complex
I realized that:
• I actually don’t need all those things (CAdvisor, collectd, Flume)
• I have spent a huge amount of time to understand and fix stuff I don’t need.
• Things which should be easy are complex with collectd and Flume.
• Lxplus has the same problem with cgroups data enrichment.
So I wrote CGroups Simple (https://gitlab.cern.ch/batch-team/cgs):
• 1200 LOC in Python
• Reads directly cgroup files
• Writes directly to Influx HTTP API through requests Python library
• Much much simpler and easier to maintain
• Can be used by both batch and lxplus (and others)
• Generic, can write wherever, can be turned into collectd plugin if needed
• Ignacio Reguero contributed and extended it for lxplus accounting
• 10

My personal takeaway
• I have wasted a lot of time trying to understand technologies I didn’t need.
• When simple things get insanely complex, step back and think about simplicity.
• Other people are likely to face the same problem – no simple way to enrich
data with collectd/Flume – we should have a common solution.
11

Thank you for your attention.
Questions & feedback welcomed!
12

Complex and simple way to write influxdb

Complex and simple way to write influxdb

More Related Content

Similar to Complex and simple way to write influxdb

Recently uploaded

Complex and simple way to write influxdb