Xephon K is a time series database using Cassandra as main backend. We talk about how to model time series data in Cassandra and compare its throughput with InfluxDB and KairosDB
2. Agenda
● Overview
● Time Series Data Revisited
● Time Series Database state of the art
● Xephon-K Design
● Xephon-K Implementation
● Evaluation
● Lessons learned
● Related & Future work
● Conclusion
3. Overview
● Written in Golang (1,700 loc including bench and test)
● Use Cassandra as main backend
● Simple data model
● It is working
4. Time Series Data Revisited
NOT just data with timestamp
‘What happened, happened
and couldn’t have happened
another way’
- The Matrix
5. Time Series Data Revisited
Name Saving Update
time
Rabbit $100 2017/03/20
:12:59:33
Tiger $250 2017/03/20
:12:59:33
Name Daily
Transaction
Date
Rabbit +$100, 000 2017/03/19
Rabbit -$99, 900 2017/03/20
Tiger +$125 2017/03/19
Tiger +$125 2017/03/20
Single record, update in place, tell current state
A series of events, immutable, tell the history
6. Time Series Database state of the art
Xephon-K Cassandra Yes Golang at15 N/A 1
Full list on: https://github.com/xephonhq/awesome-time-series-database
9. Xephon-K Implementation - Naive schema
metric_name metric_timestamp value
cpu 2017/03/17:13:24:00:20 10.2
cpu 2017/03/17:13:25:00:00 3.3
cpu 2017/03/17:13:26:00:00 5.6
mem 2017/03/17:13:24:00:20 80.3
mem 2017/03/17:13:25:00:00 60.2
mem 2017/03/17:13:26:00:00 90.3
cqlsh> SELECT * FROM metrics
10. Xephon-K Implementation - Naive schema
name metric_timestamp val
cpu 2017/03/17:13:24:00:20 10.2
cpu 2017/03/17:13:25:00:00 3.3
cpu 2017/03/17:13:26:00:00 5.6
mem 2017/03/17:13:24:00:20 80.3
mem 2017/03/17:13:25:00:00 60.2
mem 2017/03/17:13:26:00:00 90.3
The table is an abstraction of underlying map
12. Xephon-K Implementation - Internal representation
type IntPoint struct {
T int64
V int
}
type DoublePoint struct {
T int64
V double
}
type IntSeries struct {
Name string
Tags map[string]string
Points []IntPoint
}
type DoubleSeries struct {
Name string
Tags map[string]string
Points []DoublePoint
}
14. Xephon-K Implementation - In Memory storage
type Data map[SeriesID]*IntSeriesStore
type IntSeriesStore struct {
mu sync.RWMutex
series common.IntSeries
length int
}
type Index []IndexRow
type IndexRow struct {
key string
value string
seriesID SeriesID
}
19. Evaluation - Throughput
Database Total Requests
XKM 12327
XKC 7931
KairosDB 15561
InfluxDB 118
5 seconds, 10 workers
● InfluxDB performance is extremely poor (my bad?)
● KairosDB outperformed Xephon-K (K is from KairosDB …)
● Prometheus can’t be benchmarked (no HTTP API)
20. Evaluation Analysis
Q: Why InfluxDB is so slow ?
A: Good question, I am still figuring it out (see #15), you can’t blame docker, run it locally results the same
Q: Why KairosDB is faster, Java > Golang ?
● lock
● Buffer (batch size)
Q: That’s it?
A: Bingo! But https://github.com/xephonhq/xephon-k/tree/master/doc/bench
has bunch of results I didn’t dealt with
Q: The chart looks good, what are you using?
A: echarts3 http://echarts.baidu.com/ (One JavaScript a day, Keep Microsoft Excel away)
21. Lessons learned
● Write ugly code and make things work
● Hardware improve productivity, double the monitor, double the Loc/hr
● Source code is your bestfriend, don’t blindly believe what people say in the
doc, blog, conference, paper, twitter, stackoverflow
22. Related work
Xephon-B: A TSDB benchmark tool and benchmark result sharing platform
● https://github.com/xephonhq/xephon-b
● Is a never finished course project with @zchen
Reika A DSL for TSDB
● https://github.com/xephonhq/tsdb-proxy-java/tree/master/ql
● Is also a course project two
Xephon-K: I am course project three QvQ
<- Reika
23. Future work
● Refactor (everyday I am blaming the code of yesterday)
● Storage without Cassandra (yeah, this is course project four)
● Dashboard
● Benchmark driven development using Xephon-B
25. Conclusion
● Time series data is a series of immutable data points, it tells history
● CQL is an illusion created for RDBMS people
● Cassandra is a map of maps that contains maps
● http://echarts.baidu.com/ is a good charting library
● Ugly code works, perfect is the enemy of deadline (well, video games to be honest)
● Xephon-K is awesome
● What people say in their presentation may not be true, use the source, Luke