Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Sam Dillard [InfluxData] | Performance Optimization in InfluxDB | InfluxDays Virtual Experience London 2020

80 views

Published on

Like my past talks on this, I will give a rundown of the different levers one can pull to make InfluxDB perform better for one's use case. As I do each iteration of this, I have additional slides to add to this topic.

Most of the presentation focuses on write procedure as that is what defines schema and, ultimately, how queries will work against the DB.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Sam Dillard [InfluxData] | Performance Optimization in InfluxDB | InfluxDays Virtual Experience London 2020

  1. 1. Sam Dillard, Senior Sales Engineer Optimizing InfluxDB Performance
  2. 2. Agenda ❖ Optimizing ➢ Hardware/Architecture ➢ Write Method ➢ Schema ❖ Q&A
  3. 3. © 2020 InfluxData. All rights reserved. 3 Resource Utilization • No Specific OS Tuning Required • IOPS IOPS IOPS • 70% cpu/mem utilization - need head room for: • Peak periods • Compactions • Backfilling data
  4. 4. Agenda ❖ Optimizing ➢ Hardware/Architecture ➢ Write Method ➢ Schema ❖ Q&A
  5. 5. © 2020 InfluxData. All rights reserved. 5
  6. 6. © 2020 InfluxData. All rights reserved. 6 Telegraf • Lightweight; written in Go • Plug-in driven • Optimized for writing to InfluxDB • Formatting • Retries • Modifiable batch sizes and jitter • Tag sorting • Preprocessing • Converting tags to fields, fields to tags • Regex transformations • Renaming measurements, tags • Aggregations (mean, min, max, count, variance, stddev, etc.)
  7. 7. Popular Plugins Out-of-the-box Custom Kubernetes (kubelet) HTTP/socket listener Kube_inventory (apiserver) HTTP (formatted endpoints) Kafka (consumer) Prometheus (/metrics) SNMP Exec AMQP (mq metadata) StatsD Redis Nginx HAproxy Jolokia2
  8. 8. Telegraf CPU Mem Disk Docker Kubernetes /metrics Kafka MySQL Process -transform -decorate -filter Aggregate -mean -min,max -count -variance -stddev File InfluxDB Kafka CloudWatch CloudWatch
  9. 9. Parsing ● JSON ● CSV ● Graphite ● CollectD ● Dropwizard ● Form URL-encoded ● Grok
  10. 10. Telegraf InfluxDB Telegraf Telegraf Telegraf Telegraf Telegraf Telegraf Telegraf Telegraf Telegraf Message Queue Telegraf Kafka Rabbit Active NSQ AWS Kinesis Google PubSub MQTT
  11. 11. © 2017 InfluxData. All rights reserved.11 Telegraf InfluxDB Telegraf Telegraf Telegraf Telegraf Telegraf Telegraf Telegraf Telegraf Telegraf Telegraf Telegraf Telegraf Telegraf Telegraf Telegraf
  12. 12. Balanced ingestion helps....
  13. 13. Good... Not so good...
  14. 14. Good... Not so good...
  15. 15. Agenda ❖ Optimizing ➢ Hardware/Architecture ➢ Write Method ➢ Schema ❖ Q&A
  16. 16. © 2018 InfluxData. All rights reserved.16 Schema Design Goals • By reducing... – Measurement/tag cardinality – Information-encoding – Key lengths • You increase… – Write performance – Query performance – Readability
  17. 17. © 2018 InfluxData. All rights reserved.17 “It’s a feature, not a bug...but features require thinking” - Richard Laskey, Wayfair
  18. 18. © 2018 InfluxData. All rights reserved.18 Line Protocol && Schema Insight <measurement,tagset fieldset timestamp> ● A Measurement is a namespace for like metrics (SQL table) ● What to make a Measurement? ○ Logically-alike metrics; categorization ○ I.e., CPU has metrics has many metrics associated with it ○ I.e., Transactions ■ “usd”,”response_time”,”duration_ms”,”timeout”, whatever else… ● What to make a Tag? ○ Metadata; “things you need to `GROUP BY`” ● What to make a Field? ○ Actual metrics ■ Metrics you will visualize or operate on ○ Things that have high value variance...that you don’t need to group
  19. 19. © 2018 InfluxData. All rights reserved.19 Line Protocol Goals 1) Don’t encode data into Measurements or Tags; indicated by valuesless key names (value, counter, gauge) 2) Write as many Fields per Line as you can; #1 allows for #2 3) Separate information into primitives; reduce regex grouping 4) Order Tags lexicographically (Telegraf does all this for you, for the most part)
  20. 20. © 2018 InfluxData. All rights reserved.20 DON'T ENCODE DATA INTO THE MEASUREMENT NAME Measurement names like: Encode that information as tags: Cpu.server-5.us-west.usage_user value=20.0 1444234982000000000 cpu.server-6.us-west.usage_user value=40.0 1444234982000000000 mem.server-6.us-west.free value=25.0 1444234982000000000 cpu,host=server-5,region=us-west usage_user=20.0 1444234982000000000 cpu,host=server-6,region=us-west usage_user=40.0 1444234982000000000 mem,host=server-6,region=us-west mem_free=25.0 1444234982000000
  21. 21. © 2018 InfluxData. All rights reserved.21 DON’T OVERLOAD TAGS (separate into primitives) BAD GOOD: Separate out into different tags: xxx cpu,server=localhost.us-west.usage_user value=2.0 1444234982000000000 cpu,server=localhost.us-east.usage_system value=3.0 1444234982000000000 cpu,host=localhost,region=us-west usage_user=2.0 1444234982000000000 cpu,host=localhost,region=us-east usage_system=3.0 1444234982000000000
  22. 22. © 2017 InfluxData. All rights reserved.22 Use Telegraf as a Graphite parser Graphite like: cpu.usage.eu-west.idle.percentage 100 With a Telegraf configuration like: Results in following transformation: cpu_usage,region=eu-east idle_percentage=100 [[inputs.http_listener_v2]] data_format = “graphite” separator = "_" templates = [ "measurement.measurement.region.field*" ]
  23. 23. © 2018 InfluxData. All rights reserved.23
  24. 24. © 2018 InfluxData. All rights reserved.24
  25. 25. © 2017 InfluxData. All rights reserved.25 stock_prices,symbol=BP price=25.0 1 stock_prices,symbol=CVX price=35.0 1 stock_prices,symbol=XOM price=45.0 1
  26. 26. © 2017 InfluxData. All rights reserved.26 stock_prices,symbol=XOM open=25.0,high=45.0,low=20.0,close=35.0 1 stock_prices,symbol=XOM open=20.0,high=40.0,low=20.0,close=40.0 2
  27. 27. © 2018 InfluxData. All rights reserved.27 Also smaller payloads: From: cpu,region=us-west-1,host=hostA,container=containerA usage_user=35.0 <timestamp> cpu,region=us-west-1,host=hostA,container=containerA usage_system=15.0 <timestamp> cpu,region=us-west-1,host=hostA,container=containerA usage_guest=0.0 <timestamp> cpu,region=us-west-1,host=hostA,container=containerA usage_guest_nice=0.0 <timestamp> cpu,region=us-west-1,host=hostA,container=containerA usage_idle=35.0 <timestamp> cpu,region=us-west-1,host=hostA,container=containerA usage_iowait=0.2 <timestamp> cpu,region=us-west-1,host=hostA,container=containerA usage_irq=0.0 <timestamp> cpu,region=us-west-1,host=hostA,container=containerA usage_nice=1.0 <timestamp> cpu,region=us-west-1,host=hostA,container=containerA usage_steal=2.0 <timestamp> cpu,region=us-west-1,host=hostA,container=containerA usage_softirq=2.5 <timestamp> To: cpu,region=us-west-1,host=hostA,container=containerA usage_user=35.0,usage_system=15.0,usage_guest=0.0,usage_guest_nice=0.0,usage_idle=35.0,usage_iowait=0.2,usage_irq=0.0, usage_irq=0.0,usage_nice=1.0,usage_steal=2.0,usage_softirq=2.5 <timestamp>
  28. 28. Agenda ❖ Optimizing ➢ Hardware/Architecture ➢ Write Method ➢ Schema ➢ Queries ➢ Configuration ❖ Q&A
  29. 29. © 2020 InfluxData. All rights reserved. 29 Query Performance • Streaming functions > batch functions • Batch funcs • percentile(), spread(), stddev(), median(), mode(), holt-winters • Stream funcs • mean(),bottom(),first(),last(),max(),top(),count(),etc. • Distributed functions (clusters only) > local functions • Distributed • first(),last(),max(),min(),count(),mean(),sum() • Local • percentile(),derivative(),spread(),top(),bottom(),elapsed(),etc.
  30. 30. © 2020 InfluxData. All rights reserved. 30 Query Performance • Boundaries! • Time-bounding and series-bounding with `WHERE` clause • `SELECT *` generally not a best practice • Agg functions instead of raw queries • `SELECT mean(<field>)` > `SELECT <field>` • Reduce `GROUP BY time` intervals • Subqueries • When appropriate, process data from an already processed subset of data • SELECT min("max") FROM (SELECT max("usage_user") FROM cpu WHERE time > now() - 5d GROUP BY time(5m))
  31. 31. sam@influxdata.com @SDillard12 THANKS!

×