Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Optimizing InfluxDB Performance in the Real World | Sam Dillard | InfluxData

151 views

Published on

Sam will provide practical tips and techniques learned from helping hundreds of customers deploy InfluxDB and InfluxDB Enterprise. This includes hardware and architecture choices, schema design, configuration setup, and running queries.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Optimizing InfluxDB Performance in the Real World | Sam Dillard | InfluxData

  1. 1. Sam Dillard, Sales Engineer Optimizing InfluxDB Performance
  2. 2. © 2017 InfluxData. All rights reserved.2 © 2017 InfluxData. All rights reserved.2 ❖ Optimizing ➢ Hardware/Architecture ➢ Write Method ➢ Schema ❖ Q&A Agenda
  3. 3. © 2017 InfluxData. All rights reserved.3 Resource Utilization • No Specific OS Tuning Required • IOPS IOPS IOPS • 70% cpu/mem utilization - need head room for: • Peak periods • Compactions • Backfilling data
  4. 4. © 2017 InfluxData. All rights reserved.4 © 2017 InfluxData. All rights reserved.4 ❖ Optimizing ➢ Hardware/Architecture ➢ Write Method ➢ Schema ❖ Q&A Agenda
  5. 5. © 2017 InfluxData. All rights reserved.5 © 2017 InfluxData. All rights reserved.5
  6. 6. © 2017 InfluxData. All rights reserved.6 Telegraf • Lightweight; written in Go • Plug-in driven • Optimized for writing to InfluxDB • Formatting • Retries • Modifiable batch sizes and jitter • Tag sorting • Preprocessing • Converting tags to fields, fields to tags • Regex transformations • Renaming measurements, tags • Aggregations (mean, min, max, count, variance, stddev, etc.)
  7. 7. © 2017 InfluxData. All rights reserved.7 Popular Plugins Out-of-the-box Custom Kubernetes (kubelet) HTTP/socket listener Kube_inventory (apiserver) HTTP (formatted endpoints) Kafka (consumer) Prometheus (/metrics) RabbitMQ (consumer) Exec AMQP (mq metadata) StatsD Redis Nginx HAproxy Jolokia2
  8. 8. © 2017 InfluxData. All rights reserved.8 Telegraf CPU Mem Disk Docker Kubernetes /metrics Kafka MySQL Process -transform -decorate -filter Aggregate -mean -min,max -count -variance -stddev File InfluxDB Kafka CloudWatch CloudWatch
  9. 9. © 2017 InfluxData. All rights reserved.9 Parsing ● JSON ● CSV ● Graphite ● CollectD ● Dropwizard ● Form URL-encoded ● Grok
  10. 10. © 2017 InfluxData. All rights reserved.10 Telegraf InfluxDB Telegraf Telegraf Telegraf Telegraf Telegraf Telegraf Telegraf Telegraf Telegraf Message Queue Telegraf Kafka Rabbit Active NSQ AWS Kinesis Google PubSub MQTT
  11. 11. © 2017 InfluxData. All rights reserved.11 Balanced ingestion helps....
  12. 12. © 2017 InfluxData. All rights reserved.12 Good... Not so good...
  13. 13. © 2017 InfluxData. All rights reserved.13 Good... Not so good...
  14. 14. © 2017 InfluxData. All rights reserved.14 © 2017 InfluxData. All rights reserved.14 ❖ Optimizing ➢ Hardware/Architecture ➢ Write Method ➢ Schema ❖ Q&A Agenda
  15. 15. © 2018 InfluxData. All rights reserved.15 Schema Design Goals • By reducing... – Measurement/tag cardinality – Information-encoding – Key lengths • You increase… – Write performance – Query performance – Readability
  16. 16. © 2018 InfluxData. All rights reserved.16 “It’s a feature, not a bug...but features require thinking” - Richard Laskey, Wayfair
  17. 17. © 2018 InfluxData. All rights reserved.17 Line Protocol && Schema Insight <measurement,tagset fieldset timestamp> ● A Measurement is a namespace for like metrics (SQL table) ● What to make a Measurement? ○ Logically-alike metrics; categorization ○ I.e., CPU has metrics has many metrics associated with it ○ I.e., Transactions ■ “usd”,”response_time”,”duration_ms”,”timeout”, whatever else… ● What to make a Tag? ○ Metadata; “things you need to `GROUP BY`” ● What to make a Field? ○ Actual metrics ■ Metrics you will visualize or operate on ○ Things that have high value variance...that you don’t need to group
  18. 18. © 2018 InfluxData. All rights reserved.18 Line Protocol Goals 1) Don’t encode data into Measurements or Tags; indicated by valuesless key names (value, counter, gauge) 2) Write as many Fields per Line as you can; #1 allows for #2 3) Separate information into primitives; reduce regex grouping 4) Order Tags lexicographically (Telegraf does all this for you, for the most part)
  19. 19. © 2018 InfluxData. All rights reserved.19 DON'T ENCODE DATA INTO THE MEASUREMENT NAME Measurement names like: Encode that information as tags: Cpu.server-5.us-west.usage_user value=20.0 1444234982000000000 cpu.server-6.us-west.usage_user value=40.0 1444234982000000000 mem.server-6.us-west.free value=25.0 1444234982000000000 cpu,host=server-5,region=us-west usage_user=20.0 1444234982000000000 cpu,host=server-6,region=us-west usage_user=40.0 1444234982000000000 mem,host=server-6,region=us-west mem_free=25.0 1444234982000000
  20. 20. © 2018 InfluxData. All rights reserved.20 DON’T OVERLOAD TAGS (separate into primitives) BAD GOOD: Separate out into different tags: xxx cpu,server=localhost.us-west.usage_user value=2.0 1444234982000000000 cpu,server=localhost.us-east.usage_system value=3.0 1444234982000000000 cpu,host=localhost,region=us-west usage_user=2.0 1444234982000000000 cpu,host=localhost,region=us-east usage_system=3.0 1444234982000000000
  21. 21. © 2017 InfluxData. All rights reserved.21 Use Telegraf as a Graphite parser Graphite like: cpu.usage.eu-west.idle.percentage 100 With a Telegraf configuration like: Results in following transformation: cpu_usage,region=eu-east idle_percentage=100 [[inputs.http_listener_v2]] data_format = “graphite” separator = "_" templates = [ "measurement.measurement.region.field*" ]
  22. 22. © 2018 InfluxData. All rights reserved.22
  23. 23. © 2018 InfluxData. All rights reserved.23
  24. 24. © 2017 InfluxData. All rights reserved.24 stock_prices,symbol=BP price=25.0 1 stock_prices,symbol=CVX price=35.0 1 stock_prices,symbol=XOM price=45.0 1
  25. 25. © 2017 InfluxData. All rights reserved.25 stock_prices,symbol=XOM open=25.0,high=45.0,low=20.0,close=35.0 1 stock_prices,symbol=XOM open=20.0,high=40.0,low=20.0,close=40.0 2
  26. 26. © 2018 InfluxData. All rights reserved.26 Also smaller payloads: From: cpu,region=us-west-1,host=hostA,container=containerA usage_user=35.0 <timestamp> cpu,region=us-west-1,host=hostA,container=containerA usage_system=15.0 <timestamp> cpu,region=us-west-1,host=hostA,container=containerA usage_guest=0.0 <timestamp> cpu,region=us-west-1,host=hostA,container=containerA usage_guest_nice=0.0 <timestamp> cpu,region=us-west-1,host=hostA,container=containerA usage_idle=35.0 <timestamp> cpu,region=us-west-1,host=hostA,container=containerA usage_iowait=0.2 <timestamp> cpu,region=us-west-1,host=hostA,container=containerA usage_irq=0.0 <timestamp> cpu,region=us-west-1,host=hostA,container=containerA usage_nice=1.0 <timestamp> cpu,region=us-west-1,host=hostA,container=containerA usage_steal=2.0 <timestamp> cpu,region=us-west-1,host=hostA,container=containerA usage_softirq=2.5 <timestamp> To: cpu,region=us-west-1,host=hostA,container=containerA usage_user=35.0,usage_system=15.0,usage_guest=0.0,usage_guest_nice=0.0,usage_idle=35.0,usage_iowait=0.2,usage_irq=0.0 ,usage_irq=0.0,usage_nice=1.0,usage_steal=2.0,usage_softirq=2.5 <timestamp>
  27. 27. sam@influxdata.com @SDillard12 THANKS!
  28. 28. © 2017 InfluxData. All rights reserved.28 © 2017 InfluxData. All rights reserved.28 ❖ Optimizing ➢ Hardware/Architecture ➢ Write Method ➢ Schema ➢ Queries ➢ Configuration ❖ Q&A Agenda
  29. 29. © 2017 InfluxData. All rights reserved.29 © 2017 InfluxData. All rights reserved.29 Queries and Shards • Shard durations should be longer than your longest typical query • When thinking about balancing writes/reads: High Query load Low Query Load High Write Load Balanced duration Shorter duration Bursty or Low Write Load Longer duration Balanced duration
  30. 30. © 2017 InfluxData. All rights reserved.30 © 2017 InfluxData. All rights reserved.30 Query Performance • Streaming functions > batch functions • Batch funcs • percentile(), spread(), stddev(), median(), mode(), holt-winters • Stream funcs • mean(),bottom(),first(),last(),max(),top(),count(),etc. • Distributed functions (clusters only) > local functions • Distributed • first(),last(),max(),min(),count(),mean(),sum() • Local • percentile(),derivative(),spread(),top(),bottom(),elapsed(),etc.
  31. 31. © 2017 InfluxData. All rights reserved.31 © 2017 InfluxData. All rights reserved.31 Query Performance • Boundaries! • Time-bounding and series-bounding with `WHERE` clause • `SELECT *` generally not a best practice • Agg functions instead of raw queries • `SELECT mean(<field>)` > `SELECT <field>` • Reduce `GROUP BY time` intervals • Subqueries • When appropriate, process data from an already processed subset of data • SELECT min("max") FROM (SELECT max("usage_user") FROM cpu WHERE time > now() - 5d GROUP BY time(5m))
  32. 32. © 2017 InfluxData. All rights reserved.32 © 2017 InfluxData. All rights reserved.32 Exercise ``` (1) SELECT usage_user,usage_system FROM cpu WHERE time > now() - 7d GROUP BY time(1m), host ``` ``` (2) SELECT usage_user,usage_system FROM cpu WHERE time > now() - 7d GROUP BY time(5m), host ``` ● Shard duration = 24h ● Hosts = 1,000 ● Retention = 2w Question: Which is faster?
  33. 33. © 2017 InfluxData. All rights reserved.33 © 2017 InfluxData. All rights reserved.33 Answer ● (1) = 10,080 time buckets * 2 Fields * 1,000 hosts = 20,160,000 series-time groups ● (2) = 2,016 time buckets * 2 Fields * 1,000 hosts = 4,032,000 series-time groups
  34. 34. © 2017 InfluxData. All rights reserved.34 © 2017 InfluxData. All rights reserved.34 Exercise ``` (1) SELECT usage_user,usage_system FROM cpu WHERE time > now() - 1d GROUP BY time(1m), host, container_id,app ``` ``` (2) SELECT * FROM docker WHERE time > now() - 1d GROUP BY time(5m), host ``` ● Shard duration = 7d ● Hosts = 10,000 ● Retention = 2w Question: Which is faster?
  35. 35. © 2017 InfluxData. All rights reserved.35 © 2017 InfluxData. All rights reserved.35 Answer ● It depends. Docker measurement has 100+ Fields. ● `SELECT * from query (2) is going to grab every single one of those from every single host vs. the mere 2 from query (1)
  36. 36. © 2017 InfluxData. All rights reserved.36 © 2017 InfluxData. All rights reserved.36 ❖ Optimizing ➢ Hardware/Architecture ➢ Write Method ➢ Schema ➢ Queries ➢ Configuration ❖ Q&A Agenda
  37. 37. © 2017 InfluxData. All rights reserved.37 Tuning Parameters • Max-concurrent-queries • Max-select-point • Max-select-series • Rate limiting compactions • Max-concurrent-compactions • Compact-full-write-cold-duration • Compact-throughput • Compact-throughput-burst
  38. 38. © 2017 InfluxData. All rights reserved.38 Tuning Parameters • Cache-max-memory-size • Cache-snapshot-memory-size • Cache-snapshot-write-cold-duration • Max-series-per-database • Max-values-per-tag • Fine Grained Auth instead of multiple databases
  39. 39. sam@influxdata.com @SDillard12 THANKS!

×