Sam Dillard, Sales Engineer
Optimizing InfluxDB
Performance
© 2017 InfluxData. All rights reserved.2 © 2017 InfluxData. All rights reserved.2
❖ Optimizing
➢ Hardware/Architecture
➢ Write Method
➢ Schema
❖ Q&A
Agenda
© 2017 InfluxData. All rights reserved.3
Resource Utilization
• No Specific OS Tuning Required
• IOPS IOPS IOPS
• 70% cpu/mem utilization - need head room for:
• Peak periods
• Compactions
• Backfilling data
© 2017 InfluxData. All rights reserved.4 © 2017 InfluxData. All rights reserved.4
❖ Optimizing
➢ Hardware/Architecture
➢ Write Method
➢ Schema
❖ Q&A
Agenda
© 2017 InfluxData. All rights reserved.5 © 2017 InfluxData. All rights reserved.5
© 2017 InfluxData. All rights reserved.6
Telegraf
• Lightweight; written in Go
• Plug-in driven
• Optimized for writing to InfluxDB
• Formatting
• Retries
• Modifiable batch sizes and jitter
• Tag sorting
• Preprocessing
• Converting tags to fields, fields to tags
• Regex transformations
• Renaming measurements, tags
• Aggregations (mean, min, max, count, variance, stddev, etc.)
© 2017 InfluxData. All rights reserved.7
Telegraf
CPU
Mem
Disk
Docker
Kubernetes
/metrics
Kafka
MySQL
Process
-transform
-decorate
-filter
Aggregate
-mean
-min,max
-count
-variance
-stddev
File
InfluxDB
Kafka
CloudWatch
CloudWatch
© 2017 InfluxData. All rights reserved.8
Parsing
● JSON
● CSV
● Graphite
● CollectD
● Dropwizard
● Form URL-encoded
● Grok
© 2017 InfluxData. All rights reserved.9
Telegraf
InfluxDB
Telegraf
Telegraf
Telegraf
Telegraf
Telegraf
Telegraf
Telegraf
Telegraf
Telegraf
Message Queue Telegraf
Kafka
Rabbit
Active
NSQ
AWS Kinesis
Google PubSub
MQTT
© 2017 InfluxData. All rights reserved.10
Telegraf
InfluxDB
Telegraf
Telegraf
Telegraf
Telegraf
Telegraf
Telegraf
Telegraf
Telegraf
Telegraf
Telegraf
Telegraf
Telegraf
Telegraf
Telegraf
Telegraf
© 2017 InfluxData. All rights reserved.11
Balanced ingestion helps....
© 2017 InfluxData. All rights reserved.12
Good...
Not so good...
© 2017 InfluxData. All rights reserved.13
Good...
Not so good...
© 2017 InfluxData. All rights reserved.14 © 2017 InfluxData. All rights reserved.14
❖ Optimizing
➢ Hardware/Architecture
➢ Write Method
➢ Schema
❖ Q&A
Agenda
© 2018 InfluxData. All rights reserved.15
Schema Design Goals
• By reducing...
– Measurement/tag cardinality
– Information-encoding
– Key lengths
• You increase…
– Write performance
– Query performance
– Readability
© 2018 InfluxData. All rights reserved.16
“It’s a feature, not a bug...but
features require thinking”
- Richard Laskey, Wayfair
© 2018 InfluxData. All rights reserved.17
Line Protocol && Schema Insight
<measurement,tagset fieldset timestamp>
● A Measurement is a namespace for like metrics (SQL table)
● What to make a Measurement?
○ Logically-alike metrics; categorization
○ I.e., CPU has metrics has many metrics associated with it
○ I.e., Transactions
■ “usd”,”response_time”,”duration_ms”,”timeout”, whatever else…
● What to make a Tag?
○ Metadata; “things you need to `GROUP BY`”
● What to make a Field?
○ Actual metrics
■ Metrics you will visualize or operate on
○ Things that have high value variance...that you don’t need to group
© 2018 InfluxData. All rights reserved.18
Line Protocol Goals
1) Don’t encode data into Measurements or Tags; indicated by
valuesless key names (value, counter, gauge)
2) Write as many Fields per Line as you can; #1 allows for #2
3) Separate information into primitives; reduce regex grouping
4) Order Tags lexicographically
(Telegraf does all this for you, for the most part)
© 2018 InfluxData. All rights reserved.19
DON'T ENCODE DATA INTO THE MEASUREMENT NAME
Measurement names like:
Encode that information as tags:
Cpu.server-5.us-west.usage_user value=20.0 1444234982000000000
cpu.server-6.us-west.usage_user value=40.0 1444234982000000000
mem.server-6.us-west.free value=25.0 1444234982000000000
cpu,host=server-5,region=us-west usage_user=20.0 1444234982000000000
cpu,host=server-6,region=us-west usage_user=40.0 1444234982000000000
mem,host=server-6,region=us-west mem_free=25.0 1444234982000000
© 2018 InfluxData. All rights reserved.20
DON’T OVERLOAD TAGS (separate into primitives)
BAD
GOOD: Separate out into different tags:
xxx
cpu,server=localhost.us-west.usage_user value=2.0 1444234982000000000
cpu,server=localhost.us-east.usage_system value=3.0 1444234982000000000
cpu,host=localhost,region=us-west usage_user=2.0 1444234982000000000
cpu,host=localhost,region=us-east usage_system=3.0 1444234982000000000
© 2017 InfluxData. All rights reserved.21
Use Telegraf as a Graphite parser
Graphite like: cpu.usage.eu-west.idle.percentage 100
With a Telegraf configuration like:
Results in following transformation:
cpu_usage,region=eu-east idle_percentage=100
[[inputs.http_listener_v2]]
data_format = “graphite”
separator = "_"
templates = [
"measurement.measurement.region.field*"
]
© 2018 InfluxData. All rights reserved.22
© 2018 InfluxData. All rights reserved.23
© 2017 InfluxData. All rights reserved.24
stock_prices,symbol=BP price=25.0 1
stock_prices,symbol=CVX price=35.0 1
stock_prices,symbol=XOM price=45.0 1
© 2017 InfluxData. All rights reserved.25
stock_prices,symbol=XOM open=25.0,high=45.0,low=20.0,close=35.0 1
stock_prices,symbol=XOM open=20.0,high=40.0,low=20.0,close=40.0 2
© 2018 InfluxData. All rights reserved.26
Also smaller payloads:
From:
cpu,region=us-west-1,host=hostA,container=containerA usage_user=35.0 <timestamp>
cpu,region=us-west-1,host=hostA,container=containerA usage_system=15.0 <timestamp>
cpu,region=us-west-1,host=hostA,container=containerA usage_guest=0.0 <timestamp>
cpu,region=us-west-1,host=hostA,container=containerA usage_guest_nice=0.0 <timestamp>
cpu,region=us-west-1,host=hostA,container=containerA usage_idle=35.0 <timestamp>
cpu,region=us-west-1,host=hostA,container=containerA usage_iowait=0.2 <timestamp>
cpu,region=us-west-1,host=hostA,container=containerA usage_irq=0.0 <timestamp>
cpu,region=us-west-1,host=hostA,container=containerA usage_nice=1.0 <timestamp>
cpu,region=us-west-1,host=hostA,container=containerA usage_steal=2.0 <timestamp>
cpu,region=us-west-1,host=hostA,container=containerA usage_softirq=2.5 <timestamp>
To:
cpu,region=us-west-1,host=hostA,container=containerA
usage_user=35.0,usage_system=15.0,usage_guest=0.0,usage_guest_nice=0.0,usage_idle=35.0,usage_iowait=0.2,usage_irq=0.0,
usage_irq=0.0,usage_nice=1.0,usage_steal=2.0,usage_softirq=2.5 <timestamp>
sam@influxdata.com @SDillard12
THANKS!

Optimizing Time Series Performance in the Real World

  • 1.
    Sam Dillard, SalesEngineer Optimizing InfluxDB Performance
  • 2.
    © 2017 InfluxData.All rights reserved.2 © 2017 InfluxData. All rights reserved.2 ❖ Optimizing ➢ Hardware/Architecture ➢ Write Method ➢ Schema ❖ Q&A Agenda
  • 3.
    © 2017 InfluxData.All rights reserved.3 Resource Utilization • No Specific OS Tuning Required • IOPS IOPS IOPS • 70% cpu/mem utilization - need head room for: • Peak periods • Compactions • Backfilling data
  • 4.
    © 2017 InfluxData.All rights reserved.4 © 2017 InfluxData. All rights reserved.4 ❖ Optimizing ➢ Hardware/Architecture ➢ Write Method ➢ Schema ❖ Q&A Agenda
  • 5.
    © 2017 InfluxData.All rights reserved.5 © 2017 InfluxData. All rights reserved.5
  • 6.
    © 2017 InfluxData.All rights reserved.6 Telegraf • Lightweight; written in Go • Plug-in driven • Optimized for writing to InfluxDB • Formatting • Retries • Modifiable batch sizes and jitter • Tag sorting • Preprocessing • Converting tags to fields, fields to tags • Regex transformations • Renaming measurements, tags • Aggregations (mean, min, max, count, variance, stddev, etc.)
  • 7.
    © 2017 InfluxData.All rights reserved.7 Telegraf CPU Mem Disk Docker Kubernetes /metrics Kafka MySQL Process -transform -decorate -filter Aggregate -mean -min,max -count -variance -stddev File InfluxDB Kafka CloudWatch CloudWatch
  • 8.
    © 2017 InfluxData.All rights reserved.8 Parsing ● JSON ● CSV ● Graphite ● CollectD ● Dropwizard ● Form URL-encoded ● Grok
  • 9.
    © 2017 InfluxData.All rights reserved.9 Telegraf InfluxDB Telegraf Telegraf Telegraf Telegraf Telegraf Telegraf Telegraf Telegraf Telegraf Message Queue Telegraf Kafka Rabbit Active NSQ AWS Kinesis Google PubSub MQTT
  • 10.
    © 2017 InfluxData.All rights reserved.10 Telegraf InfluxDB Telegraf Telegraf Telegraf Telegraf Telegraf Telegraf Telegraf Telegraf Telegraf Telegraf Telegraf Telegraf Telegraf Telegraf Telegraf
  • 11.
    © 2017 InfluxData.All rights reserved.11 Balanced ingestion helps....
  • 12.
    © 2017 InfluxData.All rights reserved.12 Good... Not so good...
  • 13.
    © 2017 InfluxData.All rights reserved.13 Good... Not so good...
  • 14.
    © 2017 InfluxData.All rights reserved.14 © 2017 InfluxData. All rights reserved.14 ❖ Optimizing ➢ Hardware/Architecture ➢ Write Method ➢ Schema ❖ Q&A Agenda
  • 15.
    © 2018 InfluxData.All rights reserved.15 Schema Design Goals • By reducing... – Measurement/tag cardinality – Information-encoding – Key lengths • You increase… – Write performance – Query performance – Readability
  • 16.
    © 2018 InfluxData.All rights reserved.16 “It’s a feature, not a bug...but features require thinking” - Richard Laskey, Wayfair
  • 17.
    © 2018 InfluxData.All rights reserved.17 Line Protocol && Schema Insight <measurement,tagset fieldset timestamp> ● A Measurement is a namespace for like metrics (SQL table) ● What to make a Measurement? ○ Logically-alike metrics; categorization ○ I.e., CPU has metrics has many metrics associated with it ○ I.e., Transactions ■ “usd”,”response_time”,”duration_ms”,”timeout”, whatever else… ● What to make a Tag? ○ Metadata; “things you need to `GROUP BY`” ● What to make a Field? ○ Actual metrics ■ Metrics you will visualize or operate on ○ Things that have high value variance...that you don’t need to group
  • 18.
    © 2018 InfluxData.All rights reserved.18 Line Protocol Goals 1) Don’t encode data into Measurements or Tags; indicated by valuesless key names (value, counter, gauge) 2) Write as many Fields per Line as you can; #1 allows for #2 3) Separate information into primitives; reduce regex grouping 4) Order Tags lexicographically (Telegraf does all this for you, for the most part)
  • 19.
    © 2018 InfluxData.All rights reserved.19 DON'T ENCODE DATA INTO THE MEASUREMENT NAME Measurement names like: Encode that information as tags: Cpu.server-5.us-west.usage_user value=20.0 1444234982000000000 cpu.server-6.us-west.usage_user value=40.0 1444234982000000000 mem.server-6.us-west.free value=25.0 1444234982000000000 cpu,host=server-5,region=us-west usage_user=20.0 1444234982000000000 cpu,host=server-6,region=us-west usage_user=40.0 1444234982000000000 mem,host=server-6,region=us-west mem_free=25.0 1444234982000000
  • 20.
    © 2018 InfluxData.All rights reserved.20 DON’T OVERLOAD TAGS (separate into primitives) BAD GOOD: Separate out into different tags: xxx cpu,server=localhost.us-west.usage_user value=2.0 1444234982000000000 cpu,server=localhost.us-east.usage_system value=3.0 1444234982000000000 cpu,host=localhost,region=us-west usage_user=2.0 1444234982000000000 cpu,host=localhost,region=us-east usage_system=3.0 1444234982000000000
  • 21.
    © 2017 InfluxData.All rights reserved.21 Use Telegraf as a Graphite parser Graphite like: cpu.usage.eu-west.idle.percentage 100 With a Telegraf configuration like: Results in following transformation: cpu_usage,region=eu-east idle_percentage=100 [[inputs.http_listener_v2]] data_format = “graphite” separator = "_" templates = [ "measurement.measurement.region.field*" ]
  • 22.
    © 2018 InfluxData.All rights reserved.22
  • 23.
    © 2018 InfluxData.All rights reserved.23
  • 24.
    © 2017 InfluxData.All rights reserved.24 stock_prices,symbol=BP price=25.0 1 stock_prices,symbol=CVX price=35.0 1 stock_prices,symbol=XOM price=45.0 1
  • 25.
    © 2017 InfluxData.All rights reserved.25 stock_prices,symbol=XOM open=25.0,high=45.0,low=20.0,close=35.0 1 stock_prices,symbol=XOM open=20.0,high=40.0,low=20.0,close=40.0 2
  • 26.
    © 2018 InfluxData.All rights reserved.26 Also smaller payloads: From: cpu,region=us-west-1,host=hostA,container=containerA usage_user=35.0 <timestamp> cpu,region=us-west-1,host=hostA,container=containerA usage_system=15.0 <timestamp> cpu,region=us-west-1,host=hostA,container=containerA usage_guest=0.0 <timestamp> cpu,region=us-west-1,host=hostA,container=containerA usage_guest_nice=0.0 <timestamp> cpu,region=us-west-1,host=hostA,container=containerA usage_idle=35.0 <timestamp> cpu,region=us-west-1,host=hostA,container=containerA usage_iowait=0.2 <timestamp> cpu,region=us-west-1,host=hostA,container=containerA usage_irq=0.0 <timestamp> cpu,region=us-west-1,host=hostA,container=containerA usage_nice=1.0 <timestamp> cpu,region=us-west-1,host=hostA,container=containerA usage_steal=2.0 <timestamp> cpu,region=us-west-1,host=hostA,container=containerA usage_softirq=2.5 <timestamp> To: cpu,region=us-west-1,host=hostA,container=containerA usage_user=35.0,usage_system=15.0,usage_guest=0.0,usage_guest_nice=0.0,usage_idle=35.0,usage_iowait=0.2,usage_irq=0.0, usage_irq=0.0,usage_nice=1.0,usage_steal=2.0,usage_softirq=2.5 <timestamp>
  • 27.