Chronix as Long-Term Storage for Prometheus


Published on

Talk by Moritz Kammerer (@phxql, Senior Software Engineer at QAware) given at the Cloud Native Con Seattle, 2016.
Prometheus is great when it comes to monitoring and alerting. But the long term storage opportunities are comparatively weak compared to related time series databases (missing data distribution, sharding etc.). At this point Chronix enters the stage. Chronix is an open source time series database. It focuses on an efficient long term storage both in terms of storage volume and access times. Chronix achieves a compression rate of 98% compared to data in CSV files while an average query took 21 milliseconds, determined in a benchmark asking 96 queries for different time ranges and time series. Chronix offers a multi-dimensional generic data model for storing all kinds of time series, functions for anomaly detection used in the frameworks EGADS and SAX, and an integration with Apache Spark allows for distributed time series processing. In this code-intense session we show the integration of Prometheus and Chronix. We also dig into the details of Chronix and explain why Chronix <3 Prometheus and vice versa. Furthermore we demonstrate a toolchain: collect data with Prometheus, pipe them to Chronix, visualize both data sources in Grafana [5], and easily analyze tons of data with Spark and Apache Zeppelin.

Published in: Data & Analytics
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Chronix as Long-Term Storage for Prometheus

  1. 1. Chronix as long term storage for Prometheus Florian Lautenschlager, Moritz Kammerer @flolaut, @phxql
  2. 2. Prometheus Cloud Native Application Cloud Native Application Cloud Native Application Cloud Native Application Cloud Native Application Cloud Native Application Real-time monitoring and alerting for cloud native apps to detect anomalies close to their occurrence and to initiate measures. TIMENOW 14 Days
  3. 3. Beyond real-time monitoring of cloud native apps? Nothing more to do?
  4. 4. Prometheus Cloud Native Application Cloud Native Application Cloud Native Application Cloud Native Application Cloud Native Application Cloud Native Application TIMENOW THEN Real-time monitoring and alerting for cloud native apps to detect anomalies close to their occurrence and to initiate measures. Lossless long term storage to store data forever allowing analyses beyond real-time monitoring! Chronix
  5. 5. Agenda ■ Some words about Chronix, its Architecture, its Features, and its Performance. ■ How did we built the integration with Prometheus. ■ Showcase: Prometheus, Chronix Ingester, Chronix, and Grafana
  6. 6. Chronix is more than just a simple time series database. It’s a time series processing tool stack for all purposes.
  7. 7. Time Series Database: What’s that? ■ Definition 1: “A sample s is a tuple of {timestamp, value}, where the value could be any kind of object.” ■ Definition 2: “A time series T is an arbitrary list of chronological ordered samples of one value type”. ■ Definition 3: “A chunk C is a chronological ordered part of a time series.” ■ Definition 4: “A time series database TSDB is a specialized database for storing and retrieving time series in an efficient and optimized way”. s {t,v} 1 T {s1,s2} T CT T1 C1,1 C1,2 TSDB T3C2,2 T1 C2,1
  8. 8. Chronix’ architecture enables both efficient storage of time series and millisecond range queries. (1) Semantic Transformation (2) Attributes and Chunks (3) Basic Compression (4) Multi-Dimensional Storage Record data:<chunk> attributes Record data:compressed <chunk> attributes Record Storage 68 Billion Points 1 Mio. Chunks * 68.000 Points ~ 96% Compression Optional
  9. 9. The key data type of Chronix is called a record. It stores a compressed time series chunk and its attributes. record{ data:compressed{<chunk>} //technical fields id: 3dce1de0−...−93fb2e806d19 version: 1501692859622883300 start: 1427457011238 end: 1427471159292 //optional attributes host: prodI5 process: scheduler group: jmx metric: heapMemory.Usage.Used max: 896.571 } Data:compressed{<chunk of time series data>} ■ Time Series: timestamp, numeric value ■ Traces: calls, exceptions, … ■ Logs: access, method runtimes ■ Complex data: models, test coverage, anything else… Optional attributes ■ Arbitrary attributes for the time series ■ Attributes are indexed ■ Make the chunk searchable ■ Can contain pre-calculated values
  10. 10. Chronix provides specialized aggregations, transformations, and analyses for time series that are commonly used. Aggregations ■ Min / Max / Average / Sum / Count ■ Percentile ■ Standard Deviation ■ First / Last ■ Range Analyses ■ Trend Analysis Using a linear regression model ■ Outlier Analysis Using the IQR ■ Frequency Analysis Check occurrence within a time range ■ Fast Dynamic Time Warping Time series similarity search ■ Symbolic Aggregate Approximation Similarity and pattern search Transformations ■ Bottom/Top n-values ■ Moving average ■ Divide / Scale ■ Downsampling Many more Many more
  11. 11. Only scalar values? One size fits all? No! What about logs, traces, and others? No problem – Just do it yourself! ■ Chronix Time Series ■Time Series framework that is used by Chronix. ■Time Series Types: ■Numeric: Doubles (the time series known to be the default) ■More to come. public interface TimeSeriesConverter<T> { /** * Shall create an object of type T from the given binary time series. */ T from(BinaryTimeSeries binaryTimeSeriesChunk, long queryStart, long queryEnd); /** * Shall do the conversation of the custom time series T into the binary time series that is stored. */ BinaryTimeSeries to(T timeSeriesChunk); }
  12. 12. That‘s the easiest way to play with Chronix. A single instance of Chronix on a single node. Java 8 (JRE) Chronix - 0.4 Solr - 6.2.1 Lucene Solr plugins 8983 Your Computer Chronix-Query-Handler Chronix-Ingestion-Handler Chronix-Retention OpenTSDB Prometheus KairosDB HTTP Chronix-Compaction-Handler Chronix Client InfluxDB Graphite Go Java
  13. 13. Code-Slide: How to set up Chronix, ask for time series data, and call some server-side aggregations in Java. ■ Create a connection to Solr and set up Chronix ■ Define and range query and stream its results ■ Call some aggregations solr = new HttpSolrClient("http://localhost:8913/solr/chronix/") chronix = new ChronixClient(new MetricTimeSeriesConverter<>(), new ChronixSolrStorage(200, groupBy, reduce)) query = new SolrQuery("metric:*Load*"),query) query.addFilterQuery("function=max,min,count,sdiff") stream =,query) Signed Difference: First=20, Last=-100  -80 Group chunks on a combination of attributes and reduce them to a time series. Get all time series whose metric contains Load
  14. 14. Compared to other time series databases Chronix‘ results for our use case are outstanding. ■ We have evaluated Chronix with: ■InfluxDB, OpenTSDB, and KairosDB ■All databases are configured as single node ■ Storage demand for 108 GB of raw csv time series data. ■Chronix (8.7 GB) saves 20% – 84% of the space other time series databases. ■ Query times on imported data. ■73% – 92% faster on data retrieval. ■80% – 97% faster on a mix of analyses. ■ Memory footprint: after start, max during import, max during query mix ■Chronix takes 1.6 times less memory than the best alternative.
  15. 15. The hard facts. For more details I suggest you to read our research paper about Chronix. Florian Lautenschlager, Michael Philippsen, Andreas Kumlehn, Josef Adersberger Chronix: Long Term Storage and Retrieval Technology for Anomaly Detection in Operational Data FAST 2017 (submitted)
  16. 16. 17 Let‘s dig into Chronix Ingesters’ internals. Image Credit:
  17. 17. Big Picture. It’s a simply and scalable architecture. Prometheus Standard Prometheus Installation Chronix ServerChronix Ingester • Collects metrics from various services. • Writes them to its default storage • Writes them using the standard remote write interface to Chronix Ingester • Collects samples in batches and writes them later to Chronix with an ideal batch size • Writes checkpoints to disk to avoid loss of data. • Scales easily • Lossless long term storage • Data distribution (Apache Solr) • Rich set of analyses functions for data analytics beyond real- time monitoring. Chronix Chronix
  18. 18. Single Host Prometheus Chronix ServerChronix Ingester In-Memory Everything runs on a single machine. Small. Simple. Beautiful. S S S B B B S Sample: {t,v} B Batch: [{t,v},{t,v},{t,v}]
  19. 19. Single Host Prometheus Chronix Server Chronix Ingester In-Memory Once per Prometheus on a single host. Chronix Ingester In-Memory Prometheus S Sample: {t,v} B Batch: [{t,v},{t,v},{t,v}]
  20. 20. Single Host Prometheus Chronix ServerChronix Ingester In-Memory Chronix Ingester Singleton ;-) Prometheus S Sample: {t,v} B Batch: [{t,v},{t,v},{t,v}] B B B
  21. 21. Single Host Prometheus Chronix Server Chronix Ingester In-Memory Chronix Ingester Cloud behind a proxy to serve multiple Prometheus servers. Prometheus S Sample: {t,v} B Batch: [{t,v},{t,v},{t,v}] N G I N X Chronix Ingester In-Memory Prometheus Prometheus
  22. 22. Single Host Single Host Single HostSingle Host Prometheus Chronix ServerChronix Ingester In-Memory Cloud Mode: Multiple Prometheus Servers, One Chronix Ingester per Host, A Chronix Server Cloud Prometheus N G I N X Chronix Ingester In-Memory Prometheus Prometheus Chronix Server Cloud M a s t e r
  23. 23. Architectural Key Factor: The Chronix Ingestor ■ Small Go Program ■Binary Size: 8.5 MB ■Lines of Code: ~ 720 LoC ■Scales easily: Copy, Execute ■ Handles writes from Prometheus ■Just a small configuration: remote_write: url: http://<host>:<port>/ingest ■ Batches samples in memory ■Prometheus sends single samples. ■Chronix needs large chunks (n single samples) to work well ■Max Batch Age ■5M, 12H, .. ■ Crash and restart resilience ■In-memory is dangerous. The Ingester holds some amount of transient state ■Regularly writes checkpoints of the entire in-memory state to disk ■Latest checkpoint is loaded on restart
  24. 24. Chronix loves Chunks. Hence the Ingester batches samples.
  25. 25. The data models for Prometheus and Chronix are similar. ■ Prometheus ■Uses so called labels (key-value pairs) to store dimensional values ■Are added dynamically ■Stores samples (pairs of timestamp and scalar value) ■ Chronix ■Uses attributes (key-value pairs) to store dimensional values ■Schema, Schema less, Dynamic Fields, etc. ■Stores samples of timestamp an any value type: scalar, trace, string, etc.
  26. 26. An example Chronix schema to define the available fields. <?xml version="1.0" encoding="UTF-8" ?> <schema name="Chronix" version="1.5"> <types> <fieldType name="int" class="solr.TrieIntField" precisionStep="0" positionIncrementGap="0"/> <fieldType name="long" class="solr.TrieLongField" precisionStep="0" positionIncrementGap="0"/> <fieldType name="double" class="solr.TrieDoubleField" precisionStep="0" positionIncrementGap="0"/> <fieldType name="string" class="solr.StrField" sortMissingLast="true" omitNorms="true"/> <fieldType name="binary" class="solr.BinaryField"/> </types> <fields> <!-- The required fields --> <field name="id" type="string" indexed="true" stored="true" required="true"/> <field name="_version_" type="long" indexed="true" stored="true"/> <field name="start" type="long" indexed="true" stored="true" required="true"/> <field name="end" type="long" indexed="true" stored="true" required="true"/> <field name="data" type="binary" indexed="true" stored="true" required="false"/> <field name="metric" type="string" indexed="true" stored="true" required="true"/> <!-- Dynamic field for tags --> <dynamicField name="*_s" type="string" indexed="true" stored="true"/> </fields> <uniqueKey>id</uniqueKey> <solrQueryParser defaultOperator="OR"/> </schema> Definition of types Available Fields Prometheus labels are strings. Chronix Ingester creates them in Chronix Server dynamically using the dynamicField *_s. Prometheus_Label -> Chronix_Label host -> host_s
  27. 27. Showcase: Prometheus, Chronix Ingester, Chronix and Grafana Prometheus Chronix ServerChronix Ingester In-Memory S S S Grafana B B B
  28. 28. Disk usage: 11 Days of Data 112,815,835 Samples Prometheus: ~ 786 MB (whole data directory) Chronix: ~ 265 MB (without compaction) A few words about performance in our showcase.
  29. 29. Compaction Effects. Compaction Points per Chunk Amount of Records Disk Usage in MB Compaction Time in Seconds no -1 610355 265 0 yes 100 1422369 357 134 yes 500 284815 187 75 yes 1000 142573 160 93 yes 5000 28850 131 69 yes 10000 14797 126 61 yes 25000 6408 123 61 yes 100000 2051 121 60 yes 500000 920 119 63 Contains about 112 points per chunk without compaction!
  30. 30. A few words about performance in our showcase. CPU usage: 4 Cores available (= 400 % Max)
  31. 31. A few words about performance in our showcase. Memory consumption (max. 8 G) Ingester
  32. 32. Prometheus
  33. 33. Prometheus Configuration
  34. 34. Chronix Default Web-UI
  35. 35. Using the data source plugins for Chronix and Prometheus.
  36. 36. Ingester Health: Everything Green!
  37. 37. Short Term Data in Prometheus. Long Term Data in Chronix. See the difference?
  38. 38. Everything is open source and free to everyone. The code is the truth. Chronix Website: Chronix Github: - Ingester: Questions? - Twitter: @ChronixDB, @flolaut, @phxql - Slack:
  39. 39. Now it’s your turn. Now it’s your turn.