SlideShare a Scribd company logo
1 of 26
Download to read offline
Xephon-K
A lightweight TSDB with multiple backends
Pinglei Guo https://github.com/xephonhq/xephon-k
Agenda
● Overview
● Time Series Data Revisited
● Time Series Database state of the art
● Xephon-K Design
● Xephon-K Implementation
● Evaluation
● Lessons learned
● Related & Future work
● Conclusion
Overview
● Written in Golang (1,700 loc including bench and test)
● Use Cassandra as main backend
● Simple data model
● It is working
Time Series Data Revisited
NOT just data with timestamp
‘What happened, happened
and couldn’t have happened
another way’
- The Matrix
Time Series Data Revisited
Name Saving Update
time
Rabbit $100 2017/03/20
:12:59:33
Tiger $250 2017/03/20
:12:59:33
Name Daily
Transaction
Date
Rabbit +$100, 000 2017/03/19
Rabbit -$99, 900 2017/03/20
Tiger +$125 2017/03/19
Tiger +$125 2017/03/20
Single record, update in place, tell current state
A series of events, immutable, tell the history
Time Series Database state of the art
Xephon-K Cassandra Yes Golang at15 N/A 1
Full list on: https://github.com/xephonhq/awesome-time-series-database
Xephon-K Design
Xephon-K Implementation
● Naive schema and Cassandra data model
● Internal representation
● In Memory storage
● API
Xephon-K Implementation - Naive schema
metric_name metric_timestamp value
cpu 2017/03/17:13:24:00:20 10.2
cpu 2017/03/17:13:25:00:00 3.3
cpu 2017/03/17:13:26:00:00 5.6
mem 2017/03/17:13:24:00:20 80.3
mem 2017/03/17:13:25:00:00 60.2
mem 2017/03/17:13:26:00:00 90.3
cqlsh> SELECT * FROM metrics
Xephon-K Implementation - Naive schema
name metric_timestamp val
cpu 2017/03/17:13:24:00:20 10.2
cpu 2017/03/17:13:25:00:00 3.3
cpu 2017/03/17:13:26:00:00 5.6
mem 2017/03/17:13:24:00:20 80.3
mem 2017/03/17:13:25:00:00 60.2
mem 2017/03/17:13:26:00:00 90.3
The table is an abstraction of underlying map
Xephon-K Implementation
● Naive schema and Cassandra data model
● Internal representation
● In Memory storage
● API
Xephon-K Implementation - Internal representation
type IntPoint struct {
T int64
V int
}
type DoublePoint struct {
T int64
V double
}
type IntSeries struct {
Name string
Tags map[string]string
Points []IntPoint
}
type DoubleSeries struct {
Name string
Tags map[string]string
Points []DoublePoint
}
Xephon-K Implementation
● Naive schema and Cassandra data model
● Internal representation
● In Memory storage
● API
Xephon-K Implementation - In Memory storage
type Data map[SeriesID]*IntSeriesStore
type IntSeriesStore struct {
mu sync.RWMutex
series common.IntSeries
length int
}
type Index []IndexRow
type IndexRow struct {
key string
value string
seriesID SeriesID
}
Xephon-K Implementation
● Naive schema and Cassandra data model
● Internal representation
● In Memory storage
● API
Xephon-K Implementation - API Write
[
{
"name": "archive_file_tracked",
"tags": {
"host": "server1",
"data_center": "DC1"
},
"points": [
[1359788400000, 123],
[1359788300000, 13],
[1359788410000, 23]
]
}
]
http://localhost:2333/write
{
"points": [
[1359788400000, 123],
[1359788300000, 13],
],
"points": [
{"t": 1359788400000, "v": 123},
{"t": 1359788300000, "v": 13},
]
}
Use array instead of object, all numeric values are number in JSON
Evaluation Environment Setup
● i7-6700 CPU @ 3.40GHz 32 GB RAM HDD Ubuntu 16.10 ( kernel 4.8.0-39 )
● Docker 1.13 without resource limits on container
● InfluxDB 1.2
● KairosDB 1.12 + Cassandra 2.2
● Xephon-K (Go 1.7.4) + Cassandra 3.10
● Write to one series with one tag `cpi{agent:xephon-bench}` with fixed value
● Batch size 100 points, client timeout 30 seconds
● No QPS limit, No retry, No backoff
Evaluation - Throughput
Evaluation - Throughput
Database Total Requests
XKM 12327
XKC 7931
KairosDB 15561
InfluxDB 118
5 seconds, 10 workers
● InfluxDB performance is extremely poor (my bad?)
● KairosDB outperformed Xephon-K (K is from KairosDB …)
● Prometheus can’t be benchmarked (no HTTP API)
Evaluation Analysis
Q: Why InfluxDB is so slow ?
A: Good question, I am still figuring it out (see #15), you can’t blame docker, run it locally results the same
Q: Why KairosDB is faster, Java > Golang ?
● lock
● Buffer (batch size)
Q: That’s it?
A: Bingo! But https://github.com/xephonhq/xephon-k/tree/master/doc/bench
has bunch of results I didn’t dealt with
Q: The chart looks good, what are you using?
A: echarts3 http://echarts.baidu.com/ (One JavaScript a day, Keep Microsoft Excel away)
Lessons learned
● Write ugly code and make things work
● Hardware improve productivity, double the monitor, double the Loc/hr
● Source code is your bestfriend, don’t blindly believe what people say in the
doc, blog, conference, paper, twitter, stackoverflow
Related work
Xephon-B: A TSDB benchmark tool and benchmark result sharing platform
● https://github.com/xephonhq/xephon-b
● Is a never finished course project with @zchen
Reika A DSL for TSDB
● https://github.com/xephonhq/tsdb-proxy-java/tree/master/ql
● Is also a course project two
Xephon-K: I am course project three QvQ
<- Reika
Future work
● Refactor (everyday I am blaming the code of yesterday)
● Storage without Cassandra (yeah, this is course project four)
● Dashboard
● Benchmark driven development using Xephon-B
Acknowledgement
● Zheyuan Chen and Prof. Peter Alvaro for Xephon-B
● Chujiao Hou for Reika
Conclusion
● Time series data is a series of immutable data points, it tells history
● CQL is an illusion created for RDBMS people
● Cassandra is a map of maps that contains maps
● http://echarts.baidu.com/ is a good charting library
● Ugly code works, perfect is the enemy of deadline (well, video games to be honest)
● Xephon-K is awesome
● What people say in their presentation may not be true, use the source, Luke
Thank You!
No question, please, just let me go.

More Related Content

What's hot

MongoDB Aggregation MongoSF May 2011
MongoDB Aggregation MongoSF May 2011MongoDB Aggregation MongoSF May 2011
MongoDB Aggregation MongoSF May 2011Chris Westin
 
Logging for OpenStack - Elasticsearch, Fluentd, Logstash, Kibana
Logging for OpenStack - Elasticsearch, Fluentd, Logstash, KibanaLogging for OpenStack - Elasticsearch, Fluentd, Logstash, Kibana
Logging for OpenStack - Elasticsearch, Fluentd, Logstash, KibanaMd Safiyat Reza
 
Valerii Vasylkov Erlang. measurements and benefits.
Valerii Vasylkov Erlang. measurements and benefits.Valerii Vasylkov Erlang. measurements and benefits.
Valerii Vasylkov Erlang. measurements and benefits.Аліна Шепшелей
 
Scylla Summit 2018: In-Memory Scylla - When Fast Storage is Not Fast Enough
Scylla Summit 2018: In-Memory Scylla - When Fast Storage is Not Fast EnoughScylla Summit 2018: In-Memory Scylla - When Fast Storage is Not Fast Enough
Scylla Summit 2018: In-Memory Scylla - When Fast Storage is Not Fast EnoughScyllaDB
 
Redis: REmote DIctionary Server
Redis: REmote DIctionary ServerRedis: REmote DIctionary Server
Redis: REmote DIctionary ServerEzra Zygmuntowicz
 
Understanding and tuning WiredTiger, the new high performance database engine...
Understanding and tuning WiredTiger, the new high performance database engine...Understanding and tuning WiredTiger, the new high performance database engine...
Understanding and tuning WiredTiger, the new high performance database engine...Ontico
 
Cassandra Explained
Cassandra ExplainedCassandra Explained
Cassandra ExplainedEric Evans
 
Powering a Graph Data System with Scylla + JanusGraph
Powering a Graph Data System with Scylla + JanusGraphPowering a Graph Data System with Scylla + JanusGraph
Powering a Graph Data System with Scylla + JanusGraphScyllaDB
 
PostgreSQL FTS Solutions FOSDEM 2013 - PGDAY
PostgreSQL FTS Solutions FOSDEM 2013 - PGDAYPostgreSQL FTS Solutions FOSDEM 2013 - PGDAY
PostgreSQL FTS Solutions FOSDEM 2013 - PGDAYEmanuel Calvo
 
How to be Successful with Scylla
How to be Successful with ScyllaHow to be Successful with Scylla
How to be Successful with ScyllaScyllaDB
 
OpenTSDB 2.0
OpenTSDB 2.0OpenTSDB 2.0
OpenTSDB 2.0HBaseCon
 
PostgreSQL and Sphinx pgcon 2013
PostgreSQL and Sphinx   pgcon 2013PostgreSQL and Sphinx   pgcon 2013
PostgreSQL and Sphinx pgcon 2013Emanuel Calvo
 
Monitoring MySQL with OpenTSDB
Monitoring MySQL with OpenTSDBMonitoring MySQL with OpenTSDB
Monitoring MySQL with OpenTSDBGeoffrey Anderson
 
Apache Sqoop: A Data Transfer Tool for Hadoop
Apache Sqoop: A Data Transfer Tool for HadoopApache Sqoop: A Data Transfer Tool for Hadoop
Apache Sqoop: A Data Transfer Tool for HadoopCloudera, Inc.
 
9b. Document-Oriented Databases lab
9b. Document-Oriented Databases lab9b. Document-Oriented Databases lab
9b. Document-Oriented Databases labFabio Fumarola
 
The Hive Think Tank: Rocking the Database World with RocksDB
The Hive Think Tank: Rocking the Database World with RocksDBThe Hive Think Tank: Rocking the Database World with RocksDB
The Hive Think Tank: Rocking the Database World with RocksDBThe Hive
 
Postgres & Redis Sitting in a Tree- Rimas Silkaitis, Heroku
Postgres & Redis Sitting in a Tree- Rimas Silkaitis, HerokuPostgres & Redis Sitting in a Tree- Rimas Silkaitis, Heroku
Postgres & Redis Sitting in a Tree- Rimas Silkaitis, HerokuRedis Labs
 
Boosting Machine Learning with Redis Modules and Spark
Boosting Machine Learning with Redis Modules and SparkBoosting Machine Learning with Redis Modules and Spark
Boosting Machine Learning with Redis Modules and SparkDvir Volk
 
Data file handling in python binary & csv files
Data file handling in python binary & csv filesData file handling in python binary & csv files
Data file handling in python binary & csv fileskeeeerty
 

What's hot (20)

MongoDB Aggregation MongoSF May 2011
MongoDB Aggregation MongoSF May 2011MongoDB Aggregation MongoSF May 2011
MongoDB Aggregation MongoSF May 2011
 
Logging for OpenStack - Elasticsearch, Fluentd, Logstash, Kibana
Logging for OpenStack - Elasticsearch, Fluentd, Logstash, KibanaLogging for OpenStack - Elasticsearch, Fluentd, Logstash, Kibana
Logging for OpenStack - Elasticsearch, Fluentd, Logstash, Kibana
 
Valerii Vasylkov Erlang. measurements and benefits.
Valerii Vasylkov Erlang. measurements and benefits.Valerii Vasylkov Erlang. measurements and benefits.
Valerii Vasylkov Erlang. measurements and benefits.
 
Scylla Summit 2018: In-Memory Scylla - When Fast Storage is Not Fast Enough
Scylla Summit 2018: In-Memory Scylla - When Fast Storage is Not Fast EnoughScylla Summit 2018: In-Memory Scylla - When Fast Storage is Not Fast Enough
Scylla Summit 2018: In-Memory Scylla - When Fast Storage is Not Fast Enough
 
Redis: REmote DIctionary Server
Redis: REmote DIctionary ServerRedis: REmote DIctionary Server
Redis: REmote DIctionary Server
 
Understanding and tuning WiredTiger, the new high performance database engine...
Understanding and tuning WiredTiger, the new high performance database engine...Understanding and tuning WiredTiger, the new high performance database engine...
Understanding and tuning WiredTiger, the new high performance database engine...
 
Pgbr 2013 fts
Pgbr 2013 ftsPgbr 2013 fts
Pgbr 2013 fts
 
Cassandra Explained
Cassandra ExplainedCassandra Explained
Cassandra Explained
 
Powering a Graph Data System with Scylla + JanusGraph
Powering a Graph Data System with Scylla + JanusGraphPowering a Graph Data System with Scylla + JanusGraph
Powering a Graph Data System with Scylla + JanusGraph
 
PostgreSQL FTS Solutions FOSDEM 2013 - PGDAY
PostgreSQL FTS Solutions FOSDEM 2013 - PGDAYPostgreSQL FTS Solutions FOSDEM 2013 - PGDAY
PostgreSQL FTS Solutions FOSDEM 2013 - PGDAY
 
How to be Successful with Scylla
How to be Successful with ScyllaHow to be Successful with Scylla
How to be Successful with Scylla
 
OpenTSDB 2.0
OpenTSDB 2.0OpenTSDB 2.0
OpenTSDB 2.0
 
PostgreSQL and Sphinx pgcon 2013
PostgreSQL and Sphinx   pgcon 2013PostgreSQL and Sphinx   pgcon 2013
PostgreSQL and Sphinx pgcon 2013
 
Monitoring MySQL with OpenTSDB
Monitoring MySQL with OpenTSDBMonitoring MySQL with OpenTSDB
Monitoring MySQL with OpenTSDB
 
Apache Sqoop: A Data Transfer Tool for Hadoop
Apache Sqoop: A Data Transfer Tool for HadoopApache Sqoop: A Data Transfer Tool for Hadoop
Apache Sqoop: A Data Transfer Tool for Hadoop
 
9b. Document-Oriented Databases lab
9b. Document-Oriented Databases lab9b. Document-Oriented Databases lab
9b. Document-Oriented Databases lab
 
The Hive Think Tank: Rocking the Database World with RocksDB
The Hive Think Tank: Rocking the Database World with RocksDBThe Hive Think Tank: Rocking the Database World with RocksDB
The Hive Think Tank: Rocking the Database World with RocksDB
 
Postgres & Redis Sitting in a Tree- Rimas Silkaitis, Heroku
Postgres & Redis Sitting in a Tree- Rimas Silkaitis, HerokuPostgres & Redis Sitting in a Tree- Rimas Silkaitis, Heroku
Postgres & Redis Sitting in a Tree- Rimas Silkaitis, Heroku
 
Boosting Machine Learning with Redis Modules and Spark
Boosting Machine Learning with Redis Modules and SparkBoosting Machine Learning with Redis Modules and Spark
Boosting Machine Learning with Redis Modules and Spark
 
Data file handling in python binary & csv files
Data file handling in python binary & csv filesData file handling in python binary & csv files
Data file handling in python binary & csv files
 

Similar to Xephon K A Time series database with multiple backends

Make BDD great again
Make BDD great againMake BDD great again
Make BDD great againYana Gusti
 
JCConf 2016 - Google Dataflow 小試
JCConf 2016 - Google Dataflow 小試JCConf 2016 - Google Dataflow 小試
JCConf 2016 - Google Dataflow 小試Simon Su
 
Scio - Moving to Google Cloud, A Spotify Story
 Scio - Moving to Google Cloud, A Spotify Story Scio - Moving to Google Cloud, A Spotify Story
Scio - Moving to Google Cloud, A Spotify StoryNeville Li
 
Apache Calcite (a tutorial given at BOSS '21)
Apache Calcite (a tutorial given at BOSS '21)Apache Calcite (a tutorial given at BOSS '21)
Apache Calcite (a tutorial given at BOSS '21)Julian Hyde
 
Buildingsocialanalyticstoolwithmongodb
BuildingsocialanalyticstoolwithmongodbBuildingsocialanalyticstoolwithmongodb
BuildingsocialanalyticstoolwithmongodbMongoDB APAC
 
Learn backend java script
Learn backend java scriptLearn backend java script
Learn backend java scriptTsuyoshi Maeda
 
Updating materialized views and caches using kafka
Updating materialized views and caches using kafkaUpdating materialized views and caches using kafka
Updating materialized views and caches using kafkaZach Cox
 
Sorry - How Bieber broke Google Cloud at Spotify
Sorry - How Bieber broke Google Cloud at SpotifySorry - How Bieber broke Google Cloud at Spotify
Sorry - How Bieber broke Google Cloud at SpotifyNeville Li
 
Apache Spark Structured Streaming for Machine Learning - StrataConf 2016
Apache Spark Structured Streaming for Machine Learning - StrataConf 2016Apache Spark Structured Streaming for Machine Learning - StrataConf 2016
Apache Spark Structured Streaming for Machine Learning - StrataConf 2016Holden Karau
 
Stream Processing Live Traffic Data with Kafka Streams
Stream Processing Live Traffic Data with Kafka StreamsStream Processing Live Traffic Data with Kafka Streams
Stream Processing Live Traffic Data with Kafka StreamsTom Van den Bulck
 
How Opera Syncs Tens of Millions of Browsers and Sleeps Well at Night
How Opera Syncs Tens of Millions of Browsers and Sleeps Well at NightHow Opera Syncs Tens of Millions of Browsers and Sleeps Well at Night
How Opera Syncs Tens of Millions of Browsers and Sleeps Well at NightScyllaDB
 
Scio - A Scala API for Google Cloud Dataflow & Apache Beam
Scio - A Scala API for Google Cloud Dataflow & Apache BeamScio - A Scala API for Google Cloud Dataflow & Apache Beam
Scio - A Scala API for Google Cloud Dataflow & Apache BeamNeville Li
 
Data pipelines from zero to solid
Data pipelines from zero to solidData pipelines from zero to solid
Data pipelines from zero to solidLars Albertsson
 
Scaling 100PB Data Warehouse in Cloud
Scaling 100PB Data Warehouse in CloudScaling 100PB Data Warehouse in Cloud
Scaling 100PB Data Warehouse in CloudChangshu Liu
 
A quick review of Python and Graph Databases
A quick review of Python and Graph DatabasesA quick review of Python and Graph Databases
A quick review of Python and Graph DatabasesNicholas Crouch
 
GCPUG meetup 201610 - Dataflow Introduction
GCPUG meetup 201610 - Dataflow IntroductionGCPUG meetup 201610 - Dataflow Introduction
GCPUG meetup 201610 - Dataflow IntroductionSimon Su
 
A super fast introduction to Spark and glance at BEAM
A super fast introduction to Spark and glance at BEAMA super fast introduction to Spark and glance at BEAM
A super fast introduction to Spark and glance at BEAMHolden Karau
 

Similar to Xephon K A Time series database with multiple backends (20)

Make BDD great again
Make BDD great againMake BDD great again
Make BDD great again
 
JCConf 2016 - Google Dataflow 小試
JCConf 2016 - Google Dataflow 小試JCConf 2016 - Google Dataflow 小試
JCConf 2016 - Google Dataflow 小試
 
Wider than rails
Wider than railsWider than rails
Wider than rails
 
Scio - Moving to Google Cloud, A Spotify Story
 Scio - Moving to Google Cloud, A Spotify Story Scio - Moving to Google Cloud, A Spotify Story
Scio - Moving to Google Cloud, A Spotify Story
 
Apache Calcite (a tutorial given at BOSS '21)
Apache Calcite (a tutorial given at BOSS '21)Apache Calcite (a tutorial given at BOSS '21)
Apache Calcite (a tutorial given at BOSS '21)
 
Buildingsocialanalyticstoolwithmongodb
BuildingsocialanalyticstoolwithmongodbBuildingsocialanalyticstoolwithmongodb
Buildingsocialanalyticstoolwithmongodb
 
Learn backend java script
Learn backend java scriptLearn backend java script
Learn backend java script
 
Updating materialized views and caches using kafka
Updating materialized views and caches using kafkaUpdating materialized views and caches using kafka
Updating materialized views and caches using kafka
 
Sorry - How Bieber broke Google Cloud at Spotify
Sorry - How Bieber broke Google Cloud at SpotifySorry - How Bieber broke Google Cloud at Spotify
Sorry - How Bieber broke Google Cloud at Spotify
 
Apache Spark Structured Streaming for Machine Learning - StrataConf 2016
Apache Spark Structured Streaming for Machine Learning - StrataConf 2016Apache Spark Structured Streaming for Machine Learning - StrataConf 2016
Apache Spark Structured Streaming for Machine Learning - StrataConf 2016
 
Stream Processing Live Traffic Data with Kafka Streams
Stream Processing Live Traffic Data with Kafka StreamsStream Processing Live Traffic Data with Kafka Streams
Stream Processing Live Traffic Data with Kafka Streams
 
How Opera Syncs Tens of Millions of Browsers and Sleeps Well at Night
How Opera Syncs Tens of Millions of Browsers and Sleeps Well at NightHow Opera Syncs Tens of Millions of Browsers and Sleeps Well at Night
How Opera Syncs Tens of Millions of Browsers and Sleeps Well at Night
 
Einführung in MongoDB
Einführung in MongoDBEinführung in MongoDB
Einführung in MongoDB
 
Scio - A Scala API for Google Cloud Dataflow & Apache Beam
Scio - A Scala API for Google Cloud Dataflow & Apache BeamScio - A Scala API for Google Cloud Dataflow & Apache Beam
Scio - A Scala API for Google Cloud Dataflow & Apache Beam
 
Data pipelines from zero to solid
Data pipelines from zero to solidData pipelines from zero to solid
Data pipelines from zero to solid
 
Scaling 100PB Data Warehouse in Cloud
Scaling 100PB Data Warehouse in CloudScaling 100PB Data Warehouse in Cloud
Scaling 100PB Data Warehouse in Cloud
 
Revealing ALLSTOCKER
Revealing ALLSTOCKERRevealing ALLSTOCKER
Revealing ALLSTOCKER
 
A quick review of Python and Graph Databases
A quick review of Python and Graph DatabasesA quick review of Python and Graph Databases
A quick review of Python and Graph Databases
 
GCPUG meetup 201610 - Dataflow Introduction
GCPUG meetup 201610 - Dataflow IntroductionGCPUG meetup 201610 - Dataflow Introduction
GCPUG meetup 201610 - Dataflow Introduction
 
A super fast introduction to Spark and glance at BEAM
A super fast introduction to Spark and glance at BEAMA super fast introduction to Spark and glance at BEAM
A super fast introduction to Spark and glance at BEAM
 

Recently uploaded

Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureDinusha Kumarasiri
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作qr0udbr0
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Matt Ray
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...OnePlan Solutions
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
 
cpct NetworkING BASICS AND NETWORK TOOL.ppt
cpct NetworkING BASICS AND NETWORK TOOL.pptcpct NetworkING BASICS AND NETWORK TOOL.ppt
cpct NetworkING BASICS AND NETWORK TOOL.pptrcbcrtm
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceBrainSell Technologies
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfDrew Moseley
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Hr365.us smith
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsChristian Birchler
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationBradBedford3
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Cizo Technology Services
 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Developmentvyaparkranti
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...Technogeeks
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Natan Silnitsky
 
Sending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdfSending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdf31events.com
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsAhmed Mohamed
 

Recently uploaded (20)

Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a series
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
 
cpct NetworkING BASICS AND NETWORK TOOL.ppt
cpct NetworkING BASICS AND NETWORK TOOL.pptcpct NetworkING BASICS AND NETWORK TOOL.ppt
cpct NetworkING BASICS AND NETWORK TOOL.ppt
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. Salesforce
 
Advantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your BusinessAdvantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your Business
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdf
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion Application
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Development
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
 
Sending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdfSending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdf
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML Diagrams
 

Xephon K A Time series database with multiple backends

  • 1. Xephon-K A lightweight TSDB with multiple backends Pinglei Guo https://github.com/xephonhq/xephon-k
  • 2. Agenda ● Overview ● Time Series Data Revisited ● Time Series Database state of the art ● Xephon-K Design ● Xephon-K Implementation ● Evaluation ● Lessons learned ● Related & Future work ● Conclusion
  • 3. Overview ● Written in Golang (1,700 loc including bench and test) ● Use Cassandra as main backend ● Simple data model ● It is working
  • 4. Time Series Data Revisited NOT just data with timestamp ‘What happened, happened and couldn’t have happened another way’ - The Matrix
  • 5. Time Series Data Revisited Name Saving Update time Rabbit $100 2017/03/20 :12:59:33 Tiger $250 2017/03/20 :12:59:33 Name Daily Transaction Date Rabbit +$100, 000 2017/03/19 Rabbit -$99, 900 2017/03/20 Tiger +$125 2017/03/19 Tiger +$125 2017/03/20 Single record, update in place, tell current state A series of events, immutable, tell the history
  • 6. Time Series Database state of the art Xephon-K Cassandra Yes Golang at15 N/A 1 Full list on: https://github.com/xephonhq/awesome-time-series-database
  • 8. Xephon-K Implementation ● Naive schema and Cassandra data model ● Internal representation ● In Memory storage ● API
  • 9. Xephon-K Implementation - Naive schema metric_name metric_timestamp value cpu 2017/03/17:13:24:00:20 10.2 cpu 2017/03/17:13:25:00:00 3.3 cpu 2017/03/17:13:26:00:00 5.6 mem 2017/03/17:13:24:00:20 80.3 mem 2017/03/17:13:25:00:00 60.2 mem 2017/03/17:13:26:00:00 90.3 cqlsh> SELECT * FROM metrics
  • 10. Xephon-K Implementation - Naive schema name metric_timestamp val cpu 2017/03/17:13:24:00:20 10.2 cpu 2017/03/17:13:25:00:00 3.3 cpu 2017/03/17:13:26:00:00 5.6 mem 2017/03/17:13:24:00:20 80.3 mem 2017/03/17:13:25:00:00 60.2 mem 2017/03/17:13:26:00:00 90.3 The table is an abstraction of underlying map
  • 11. Xephon-K Implementation ● Naive schema and Cassandra data model ● Internal representation ● In Memory storage ● API
  • 12. Xephon-K Implementation - Internal representation type IntPoint struct { T int64 V int } type DoublePoint struct { T int64 V double } type IntSeries struct { Name string Tags map[string]string Points []IntPoint } type DoubleSeries struct { Name string Tags map[string]string Points []DoublePoint }
  • 13. Xephon-K Implementation ● Naive schema and Cassandra data model ● Internal representation ● In Memory storage ● API
  • 14. Xephon-K Implementation - In Memory storage type Data map[SeriesID]*IntSeriesStore type IntSeriesStore struct { mu sync.RWMutex series common.IntSeries length int } type Index []IndexRow type IndexRow struct { key string value string seriesID SeriesID }
  • 15. Xephon-K Implementation ● Naive schema and Cassandra data model ● Internal representation ● In Memory storage ● API
  • 16. Xephon-K Implementation - API Write [ { "name": "archive_file_tracked", "tags": { "host": "server1", "data_center": "DC1" }, "points": [ [1359788400000, 123], [1359788300000, 13], [1359788410000, 23] ] } ] http://localhost:2333/write { "points": [ [1359788400000, 123], [1359788300000, 13], ], "points": [ {"t": 1359788400000, "v": 123}, {"t": 1359788300000, "v": 13}, ] } Use array instead of object, all numeric values are number in JSON
  • 17. Evaluation Environment Setup ● i7-6700 CPU @ 3.40GHz 32 GB RAM HDD Ubuntu 16.10 ( kernel 4.8.0-39 ) ● Docker 1.13 without resource limits on container ● InfluxDB 1.2 ● KairosDB 1.12 + Cassandra 2.2 ● Xephon-K (Go 1.7.4) + Cassandra 3.10 ● Write to one series with one tag `cpi{agent:xephon-bench}` with fixed value ● Batch size 100 points, client timeout 30 seconds ● No QPS limit, No retry, No backoff
  • 19. Evaluation - Throughput Database Total Requests XKM 12327 XKC 7931 KairosDB 15561 InfluxDB 118 5 seconds, 10 workers ● InfluxDB performance is extremely poor (my bad?) ● KairosDB outperformed Xephon-K (K is from KairosDB …) ● Prometheus can’t be benchmarked (no HTTP API)
  • 20. Evaluation Analysis Q: Why InfluxDB is so slow ? A: Good question, I am still figuring it out (see #15), you can’t blame docker, run it locally results the same Q: Why KairosDB is faster, Java > Golang ? ● lock ● Buffer (batch size) Q: That’s it? A: Bingo! But https://github.com/xephonhq/xephon-k/tree/master/doc/bench has bunch of results I didn’t dealt with Q: The chart looks good, what are you using? A: echarts3 http://echarts.baidu.com/ (One JavaScript a day, Keep Microsoft Excel away)
  • 21. Lessons learned ● Write ugly code and make things work ● Hardware improve productivity, double the monitor, double the Loc/hr ● Source code is your bestfriend, don’t blindly believe what people say in the doc, blog, conference, paper, twitter, stackoverflow
  • 22. Related work Xephon-B: A TSDB benchmark tool and benchmark result sharing platform ● https://github.com/xephonhq/xephon-b ● Is a never finished course project with @zchen Reika A DSL for TSDB ● https://github.com/xephonhq/tsdb-proxy-java/tree/master/ql ● Is also a course project two Xephon-K: I am course project three QvQ <- Reika
  • 23. Future work ● Refactor (everyday I am blaming the code of yesterday) ● Storage without Cassandra (yeah, this is course project four) ● Dashboard ● Benchmark driven development using Xephon-B
  • 24. Acknowledgement ● Zheyuan Chen and Prof. Peter Alvaro for Xephon-B ● Chujiao Hou for Reika
  • 25. Conclusion ● Time series data is a series of immutable data points, it tells history ● CQL is an illusion created for RDBMS people ● Cassandra is a map of maps that contains maps ● http://echarts.baidu.com/ is a good charting library ● Ugly code works, perfect is the enemy of deadline (well, video games to be honest) ● Xephon-K is awesome ● What people say in their presentation may not be true, use the source, Luke
  • 26. Thank You! No question, please, just let me go.