Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Let's Compare: A Benchmark review of InfluxDB and Elasticsearch

3,261 views

Published on

In this webinar, Ivan K will compare the performance and features of InfluxDB and Elasticsearch for common time-series workloads, specifically looking at the rates of data ingestion, on-disk data compression, and query performance. Come hear about how Ivan conducted his tests to determine which time-series db would best fit your needs. We will reserve 15 minutes at the end of the talk for you to ask Ivan directly about his test processes and independent viewpoint.

Published in: Internet
  • Login to see the comments

Let's Compare: A Benchmark review of InfluxDB and Elasticsearch

  1. 1. Benchmarking InfluxDB vs ElasticSearch Nov 28, 2017 | Bonitoo.io
  2. 2. Presenters Bonitoo.io is an independent 3rd party company from Prague. Vlasta Hajek Tomas Klapka Ivan Kudibal
  3. 3. Mission Perform benchmarks for latest releases of InfluxDB and Elasticsearch. We refreshed the benchmarking efforts 1st conducted in 2016. Use existing testing framework: https://github.com/influxdata/influxdb-comparisons Elasticsearch -- ES -- is part of ELK stack, used primarily as storage for logs. InfluxDB is a time series database, in general designed to store and query time-series based measurements.
  4. 4. Structure of This Webinar 1. Introduction to the influxdb-comparison framework 2. Demo influxdb-comparison 3. Benchmarking and report 4. Conclusion and Q & A
  5. 5. Part I: Introduction to the InfluxDB-comparison framework
  6. 6. Ingestion Use Case - Dataset DevOps: What a system administrator would see when operating 100s of VMs CPU, Kernel, Memory, Disk Space, Disk IO, Network, Nginx, PostgreSQL, Redis Flat distribution = 101 total values per host = 9 (no. measurements) * 11.2 (average number of fields per measurement) Scalability: Framework can generate datasets in arbitrary sizes Concurrent: 4 workers were used to write data in the database
  7. 7. Test Cases Ingestion rate How much data can be written in the database Measured in [ values per second ], the higher the better = max Data size What space on disk is used by database [ MB ], the smaller the better = min criterium Query performance What is the duration of a DB query [ number of queries per second ], max
  8. 8. Benchmarking steps 1. Generate load data ○ Native wire format for each database 2. Perform bulk load ○ Send wire data ○ Using fasthttp library ○ In bulk 3. Generate query data ○ Native format for each database ○ Different conditions and intervals 4. Perform query benchmarking ○ Performing large number of queries 5. Validate results ○ Manually using print-response options
  9. 9. Ingestion Use Case - Data Example In InfluxDB Line Protocol - 10 tags, 9 fields means 9 values: Measurement, tag set of key=value, fields key=values, timestamp mem,hostname=host_0,region=ap-southeast-1,datacenter=ap-southeast-1a,rack=41,os=Ubuntu16.04LTS,arch=x64,team=NYC,service=3,service_ve rsion=1,service_environment=staging total=8589934592i,available=2614084849i,used=5975849742i,free=8490032246i,cached=4112564952i,buffered=5975849742i,used_percent=69.568 0470913698770,available_percent=30.4319529086301230,buffered_percent=52.12344275318962871451627990000000000 net,hostname=host_0,region=ap-southeast-1,datacenter=ap-southeast-1a,rack=41,os=Ubuntu16.04LTS,arch=x64,team=NYC,service=3,service_ve rsion=1,service_environment=staging,interface=eth3 total_connections_received=107964i,expired_keys=108036i,evicted_keys=108015i,keyspace_hits=107977i,keyspace_misses=10762i,instantaneo us_ops_per_sec=10871i,instantaneous_input_kbps=10824i,instantaneous_output_kbps=10783i1451627990000000000 postgresl,hostname=host_0,region=ap-southeast-1,datacenter=ap-southeast-1a,rack=41,os=Ubuntu16.04LTS,arch=x64,team=NYC,service=3,serv ice_version=1,service_environment=staging numbackends=1000i,xact_commit=1000i,xact_rollback=1000i,blks_read=1000i,blks_hit=1000i,tup_returned=1000i,tup_fetched=1000i,tup_inser ted=1000i,tup_updated=1000i,tup_deleted=1000i,conflicts=1000i,temp_files=1000i,temp_bytes=2210901i,deadlocks=1000i,blk_read_time=1000 i,blk_write_time=1000i1451627990000000000
  10. 10. Ingestion Use Case - Data Example In the Elasticsearch bulk load protocol : { "index" : { "_index" : "mem", "_type" : "point" } } {"hostname": "host_0", "region": "ap-southeast-1", "datacenter": "ap-southeast-1a", "rack": "37", "os": "Ubuntu16.10", "arch": "x86", "team": "LON", "service": "15", "service_version": "1", "service_environment": "production", "total": 8589934592, "available": 7075541242, "used": 1514393349, "free": 2339695268, "cached": 1764179373, "buffered": 1514393349, "used_percent": 17.6298589211701504, "available_percent": 82.3701410788298460, "buffered_percent": 79.4622490432259241, "timestamp": 1451627950000 }
  11. 11. Query performance Maximum CPU usage for 1 host, over the course of an hour, in 1 minute intervals. InfluxDB example: SELECT max(usage_user) FROM cpu WHERE (hostname = 'host_73') AND time >= '2016-01-01T19:24:45Z' AND time < '2016-01-01T20:24:45Z' GROUP BY time(1m) SELECT max(usage_user) FROM cpu WHERE (hostname = 'host_79') AND time >= '2016-01-01T11:14:49Z' AND time < '2016-01-01T12:14:49Z' GROUP BY time(1m)
  12. 12. Query Example - ElasticSearch
  13. 13. Part 2: DEMO Try InfluxDB comparisons framework
  14. 14. Part 3: Benchmarking and Reports
  15. 15. Configuration ElasticSearch - almost default installation - Recommended memory setup applied (half of the total memory for ElasticSearch) InfluxDB - default installation
  16. 16. ElasticSearch - Index templates Default Template - Disabled _allfield Aggregation Template - Disabled _source, _allfields - Indexed timestamp and tag fields
  17. 17. Hardware Used In our test reports we did not find any performance reason to prefer physical hw to cloud based VMs Cloud Based VMs AWS c4.4xlarge: Intel Xeon E5-2666 v3 2.9GHz, 16 vCPU, 30GB RAM, 1x EBS Provisioned 6000 IOPS SSD 120GB Physical Hardware HP HW:Intel(R) Xeon(R) CPU E5-2640 v3 @ 2.60GHz, 32vCPU (2x8cores, 2 threads per core), 32GB RAM, 300GB SCSI 15000rpm
  18. 18. Results InfluxDB 1.4 Elasticsearch 5.6.3 Default template Elasticsearch 5.6.3 Aggregation template Ingestion rate (per second) 1,432,630 189,767 143,711 Peak cpu usage 500% 1,200% 1,500% Size on disk 145MB 1.9GB 540MB Query rate (per second) 820 712 1,019 Write/Query 100 servers, 100 values, 10s, 24h, 4 workers AWS c4.4xlarge: Xeon E5-2666 v3 2.9GHz, 16 vCPU, 30GB RAM, 1x EBS Provisioned IOPS SSD
  19. 19. Vertical Scalability Elasticsearch 5.6.3 - Input rate stays almost the same, despite using more clients - Similar CPU allocation for 4 or 32 clients (11-13 of 16 cores) InfluxDB 1.4 - Input rate grows with the number of clients - CPU allocation grows with the number of clients (8-15 of 16 cores)
  20. 20. Part 4: Conclusions and Q&A
  21. 21. Conclusions InfluxDB is the data ingestion winner and best disk storage saver. Scales vertically. Elasticsearch is fastest query responder only. However, InfluxDB is still performant. TICK Stack better fits the use case of monitoring the fleet of VMs. 1. Excellent performance 2. High TTV -- zero effort setup, and maintenance, solution with less storage 3. Scalability -- the bigger the fleet the less the average effort per VM
  22. 22. Take aways This webinar technical paper and blog will be posted by end of this week. For a detailed report visit the blog and download the technical paper. Try influxdb-comparisons yourselves. Post issues at the influxdb-comparisons, https://github.com/influxdata/influxdb-comparisons Questions? Contact us directly at info@bonitoo.io
  23. 23. Q & A

×