Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Xephon K A Time series database with multiple backends

220 views

Published on

Xephon K is a time series database using Cassandra as main backend. We talk about how to model time series data in Cassandra and compare its throughput with InfluxDB and KairosDB

Published in: Software
  • Be the first to comment

  • Be the first to like this

Xephon K A Time series database with multiple backends

  1. 1. Xephon-K A lightweight TSDB with multiple backends Pinglei Guo https://github.com/xephonhq/xephon-k
  2. 2. Agenda ● Overview ● Time Series Data Revisited ● Time Series Database state of the art ● Xephon-K Design ● Xephon-K Implementation ● Evaluation ● Lessons learned ● Related & Future work ● Conclusion
  3. 3. Overview ● Written in Golang (1,700 loc including bench and test) ● Use Cassandra as main backend ● Simple data model ● It is working
  4. 4. Time Series Data Revisited NOT just data with timestamp ‘What happened, happened and couldn’t have happened another way’ - The Matrix
  5. 5. Time Series Data Revisited Name Saving Update time Rabbit $100 2017/03/20 :12:59:33 Tiger $250 2017/03/20 :12:59:33 Name Daily Transaction Date Rabbit +$100, 000 2017/03/19 Rabbit -$99, 900 2017/03/20 Tiger +$125 2017/03/19 Tiger +$125 2017/03/20 Single record, update in place, tell current state A series of events, immutable, tell the history
  6. 6. Time Series Database state of the art Xephon-K Cassandra Yes Golang at15 N/A 1 Full list on: https://github.com/xephonhq/awesome-time-series-database
  7. 7. Xephon-K Design
  8. 8. Xephon-K Implementation ● Naive schema and Cassandra data model ● Internal representation ● In Memory storage ● API
  9. 9. Xephon-K Implementation - Naive schema metric_name metric_timestamp value cpu 2017/03/17:13:24:00:20 10.2 cpu 2017/03/17:13:25:00:00 3.3 cpu 2017/03/17:13:26:00:00 5.6 mem 2017/03/17:13:24:00:20 80.3 mem 2017/03/17:13:25:00:00 60.2 mem 2017/03/17:13:26:00:00 90.3 cqlsh> SELECT * FROM metrics
  10. 10. Xephon-K Implementation - Naive schema name metric_timestamp val cpu 2017/03/17:13:24:00:20 10.2 cpu 2017/03/17:13:25:00:00 3.3 cpu 2017/03/17:13:26:00:00 5.6 mem 2017/03/17:13:24:00:20 80.3 mem 2017/03/17:13:25:00:00 60.2 mem 2017/03/17:13:26:00:00 90.3 The table is an abstraction of underlying map
  11. 11. Xephon-K Implementation ● Naive schema and Cassandra data model ● Internal representation ● In Memory storage ● API
  12. 12. Xephon-K Implementation - Internal representation type IntPoint struct { T int64 V int } type DoublePoint struct { T int64 V double } type IntSeries struct { Name string Tags map[string]string Points []IntPoint } type DoubleSeries struct { Name string Tags map[string]string Points []DoublePoint }
  13. 13. Xephon-K Implementation ● Naive schema and Cassandra data model ● Internal representation ● In Memory storage ● API
  14. 14. Xephon-K Implementation - In Memory storage type Data map[SeriesID]*IntSeriesStore type IntSeriesStore struct { mu sync.RWMutex series common.IntSeries length int } type Index []IndexRow type IndexRow struct { key string value string seriesID SeriesID }
  15. 15. Xephon-K Implementation ● Naive schema and Cassandra data model ● Internal representation ● In Memory storage ● API
  16. 16. Xephon-K Implementation - API Write [ { "name": "archive_file_tracked", "tags": { "host": "server1", "data_center": "DC1" }, "points": [ [1359788400000, 123], [1359788300000, 13], [1359788410000, 23] ] } ] http://localhost:2333/write { "points": [ [1359788400000, 123], [1359788300000, 13], ], "points": [ {"t": 1359788400000, "v": 123}, {"t": 1359788300000, "v": 13}, ] } Use array instead of object, all numeric values are number in JSON
  17. 17. Evaluation Environment Setup ● i7-6700 CPU @ 3.40GHz 32 GB RAM HDD Ubuntu 16.10 ( kernel 4.8.0-39 ) ● Docker 1.13 without resource limits on container ● InfluxDB 1.2 ● KairosDB 1.12 + Cassandra 2.2 ● Xephon-K (Go 1.7.4) + Cassandra 3.10 ● Write to one series with one tag `cpi{agent:xephon-bench}` with fixed value ● Batch size 100 points, client timeout 30 seconds ● No QPS limit, No retry, No backoff
  18. 18. Evaluation - Throughput
  19. 19. Evaluation - Throughput Database Total Requests XKM 12327 XKC 7931 KairosDB 15561 InfluxDB 118 5 seconds, 10 workers ● InfluxDB performance is extremely poor (my bad?) ● KairosDB outperformed Xephon-K (K is from KairosDB …) ● Prometheus can’t be benchmarked (no HTTP API)
  20. 20. Evaluation Analysis Q: Why InfluxDB is so slow ? A: Good question, I am still figuring it out (see #15), you can’t blame docker, run it locally results the same Q: Why KairosDB is faster, Java > Golang ? ● lock ● Buffer (batch size) Q: That’s it? A: Bingo! But https://github.com/xephonhq/xephon-k/tree/master/doc/bench has bunch of results I didn’t dealt with Q: The chart looks good, what are you using? A: echarts3 http://echarts.baidu.com/ (One JavaScript a day, Keep Microsoft Excel away)
  21. 21. Lessons learned ● Write ugly code and make things work ● Hardware improve productivity, double the monitor, double the Loc/hr ● Source code is your bestfriend, don’t blindly believe what people say in the doc, blog, conference, paper, twitter, stackoverflow
  22. 22. Related work Xephon-B: A TSDB benchmark tool and benchmark result sharing platform ● https://github.com/xephonhq/xephon-b ● Is a never finished course project with @zchen Reika A DSL for TSDB ● https://github.com/xephonhq/tsdb-proxy-java/tree/master/ql ● Is also a course project two Xephon-K: I am course project three QvQ <- Reika
  23. 23. Future work ● Refactor (everyday I am blaming the code of yesterday) ● Storage without Cassandra (yeah, this is course project four) ● Dashboard ● Benchmark driven development using Xephon-B
  24. 24. Acknowledgement ● Zheyuan Chen and Prof. Peter Alvaro for Xephon-B ● Chujiao Hou for Reika
  25. 25. Conclusion ● Time series data is a series of immutable data points, it tells history ● CQL is an illusion created for RDBMS people ● Cassandra is a map of maps that contains maps ● http://echarts.baidu.com/ is a good charting library ● Ugly code works, perfect is the enemy of deadline (well, video games to be honest) ● Xephon-K is awesome ● What people say in their presentation may not be true, use the source, Luke
  26. 26. Thank You! No question, please, just let me go.

×