Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

InfluxDB 1.0 - Optimizing InfluxDB by Sam Dillard

268 views

Published on

Learn how to optimize InfluxDB 1.0 for performance including hardware and architecture choices, schema design, configuration setup, and running queries. In this InfluxDays NYC 2019 presentation, Sam Dillard provides numerous actionable tips and insights into InfluxDB optimization.

Published in: Technology
  • Login to see the comments

  • Be the first to like this

InfluxDB 1.0 - Optimizing InfluxDB by Sam Dillard

  1. 1. Sam Dillard, Solutions Architect Getting Started Series Optimizing InfluxDB Performance
  2. 2. © 2017 InfluxData. All rights reserved.2 © 2017 InfluxData. All rights reserved.2 ✓ Optimizing • Hardware/Architecture • Schema • Configuration • Queries ✓ Q&A Agenda
  3. 3. Hardware & Architecture
  4. 4. © 2017 InfluxData. All rights reserved.4 Resource Utilization • No Specific OS Tuning Required • OSS • 70% cpu utilization - need head room for • Peak periods • Compactions • Supporting node failure (clusters only) • Backfilling data
  5. 5. © 2017 InfluxData. All rights reserved.5 © 2017 InfluxData. All rights reserved.5
  6. 6. © 2017 InfluxData. All rights reserved.6 Telegraf • Lightweight and written in Go • Plug-in driven • Optimized for writing to InfluxDB • Formatting • Retries • Modifiable batch sizes and jitter • Tag sorting • Preprocessing • Converting tags to fields, fields to tags • Regex transformations • Renaming measurements, tags • Aggregate • Mean,min,max,count,variance,stddev • Histogram • ValueCounter (i.e., for HTTP response codes)
  7. 7. © 2017 InfluxData. All rights reserved.7 Telegraf CPU Mem Disk Docker Kubernetes /metrics Kafka MySQL Process -transform -decorate -filter Aggregate -mean -min,max -count -variance -stddev File InfluxDB Kafka CloudWatch CloudWatch
  8. 8. © 2017 InfluxData. All rights reserved.8 Telegraf InfluxDB Telegraf Telegraf Telegraf Telegraf Telegraf Telegraf Telegraf Telegraf Telegraf Telegraf Telegraf Telegraf Message Queue Telegraf
  9. 9. Schema
  10. 10. © 2017 InfluxData. All rights reserved.10 Regard the Shard • Shards are defined by durations • Longer • Better frontend performance (writes & reads) • More efficient compactions • Shorter • More manageable • More efficient drops (happens everytime RPs are enforced) • More efficient for moving/copying • More efficient for recording incremental backups
  11. 11. © 2018 InfluxData. All rights reserved.11 Schema Design Goals • By reducing... – Cardinality – Information-encoding – Key lengths • You increase… – Write performance – Query performance – Readability
  12. 12. © 2018 InfluxData. All rights reserved.12 © 2017 InfluxData. All rights reserved.12 Data Format ● Points are written to InfluxDB using Line Protocol, which follows this below format: <measurement>,tag-key=tag-value field-key=field-value <timestamp> ● Punctuation is paramount! cpu_usage,host=server02,az=us-west-1b user=25.0,system=55.0 <timestamp> Measurement Tag set Field set Space Space
  13. 13. © 2018 InfluxData. All rights reserved.13 © 2017 InfluxData. All rights reserved.13 Schema Insight ● A Measurement is a namespace for like metrics (SQL table) ● What to make a Measurement? ○ Logically-alike metrics ○ I.e., CPU has metrics has many metrics associated with it ○ I.e., Transactions ■ “usd”,”response_time”,”duration_ms”,”timeout”, whatever else… ● What to make a Tag? ○ Metadata that describe unique sources of metrics...or assets ○ Often this translates to “things you need to `GROUP BY`” ● What to make a Field? ○ Actual metrics ○ More specifically? ■ Things you need to do math on or other operations ■ Things that have high value variance...that you don’t need to group
  14. 14. © 2018 InfluxData. All rights reserved.14 DON'T ENCODE DATA INTO THE MEASUREMENT NAME Measurement names like: Encode that information as tags: Cpu.server-5.us-west.usage_user value=20.0 1444234982000000000 cpu.server-6.us-west.usage_user value=40.0 1444234982000000000 mem-free.server-6.us-west.usage_user value=25.0 1444234982000000000 cpu,host=server-5,region=us-west usage_user=20.0 1444234982000000000 cpu,host=server-6,region=us-west usage_user=40.0 1444234982000000000 mem-free,host=server-6,region=us-west usage_user=25.0 1444234982000000
  15. 15. © 2018 InfluxData. All rights reserved.15 DON’T OVERLOAD TAGS BAD GOOD: Separate out into different tags: xxx cpu,server=localhost.us-west value=2 1444234982000000000 cpu,server=localhost.us-east value=3 1444234982000000000 cpu,host=localhost,region=us-west value=2 1444234982000000000 cpu,host=localhost,region=us-east value=3 1444234982000000000
  16. 16. © 2018 InfluxData. All rights reserved.16 © 2018 InfluxData. All rights reserved.16 Cardinality per database • The number of unique measurement, tag set, and field key combinations in an InfluxDB instance. • In practice we generally just care about the tagset For example: Measurement: • http Tags: • host -- 35 possible hosts • appName -- 10 possible apps • datacenter -- 4 possible DCs Fields: • responseCode • responseTime Assuming each app can reside on each host, total possible series cardinality for this measurement is 1*(35 * 10) + 2 = 352 Note 1: data center is a dependent tag since each host can only reside in 1 data center therefore adding data center as a tag does not increase cardinality Note 2: all hosts probably don’t support all 10 apps. Therefore, “real” cardinality is likely less than 353.
  17. 17. Configuration
  18. 18. © 2017 InfluxData. All rights reserved.18 TSI - How does it help • Memory use: • TSI-->30M series = 1-2GB • In-mem-->30M series = inconceivable • Startup performance. Startup time should be insignificant, even for very large indexes. • The tsi1 index has its own index, while the inmem index is shared across all shards. • Write performance--tsi1 index will only need to consult index files relevant to the hot data.
  19. 19. © 2017 InfluxData. All rights reserved.19 Tuning Parameters • Max-concurrent-queries • Max-select-point • Max-select-series • Rate limiting compactions • Max-concurrent-compactions • Compact-full-write-cold-duration • Compact-throughput (new in v1.7!) • Compact-throughput-burst (new in v1.7!)
  20. 20. © 2017 InfluxData. All rights reserved.20 Tuning Parameters • Cache-max-memory-size • Cache-snapshot-memory-size • Cache-snapshot-write-cold-duration • Max-series-per-database • Max-values-per-tag • Fine Grained Auth instead of multiple databases
  21. 21. Queries
  22. 22. © 2017 InfluxData. All rights reserved.22 © 2017 InfluxData. All rights reserved.22 Queries and Shards • Shard durations should be longer than your longest typical query • When thinking about balancing writes/reads: High Query load Low Query Load High Write Load Balanced duration Shorter duration Bursty or Low Write Load Longer duration Balanced duration
  23. 23. © 2017 InfluxData. All rights reserved.23 © 2017 InfluxData. All rights reserved.23 Query Performance • Streaming functions > batch functions • Batch funcs • percentile(), spread(), stddev(), median(), mode(), holt-winters • Stream funcs • mean(),bottom(),first(),last(),max(),top(),count(),etc. • Distributed functions (clusters only) > local functions • Distributed • first(),last(),max(),min(),count(),mean(),sum() • Local • percentile(),derivative(),spread(),top(),bottom(),elapsed(),etc.
  24. 24. © 2017 InfluxData. All rights reserved.24 © 2017 InfluxData. All rights reserved.24 Query Performance • Boundaries! • Time-bounding and series-bounding with `WHERE` clause • `SELECT *` generally not a best practice • Agg functions instead of raw queries • `SELECT mean(<field>)` > `SELECT <field>` • Subqueries • When appropriate, process data from an already processed subset of data • SELECT SUM("max") FROM (SELECT MAX("water_level") FROM "h2o_feet" WHERE time > now() - 1d GROUP BY "location")
  25. 25. © 2018 InfluxData. All rights reserved.25 DON'T CREATE TOO MANY LOGICAL CONTAINERS Or rather, don’t write to too many databases: • Dozens of databases should be fine • Hundreds might be okay • Thousands probably aren't without careful design Too many databases leads to more open files, more query iterators in RAM, and more shards expiring. Expiring shards have a non-trivial RAM and CPU cost to clean up the indices.
  26. 26. sam@influxdata.com @SDillard12

×