Redis as a time-series DB
Danni Moiseyev, Redislabs
What is a time series?
● What do you mean when you say time series?
● An ordered list of samples, each sample is a timestamp and value.
● Usually used for metrics
● Downsampled for longer retention
time
Unique features of time series
● Queries are usually a range query over time
● Can be compressed by using known techniques
● Timestamps are easy to index
● Append data only
Why store time series in Redis?
● People are already doing that using SortedSets
● Serving fast queries for time series
○ Dashboards
○ Mobile/IOT
○ Fast alerts on data
● Fast ingestion rate
○ use Redis as the first line metrics ingester
● Ease of deployment
○ Just run Redis with the module
Using SortedSets
● Single SortedSet key
● The score is the timestamp
● The values are unique data
● Unique data can be:
● A text blob containing the timestamp and some data
● A “reference” to another key with the actual data
● How do you query?
1. ZRANGEBYSCORE - get all related keys
2. Fetch all data from the keys
Introducing redis-timeseries
● Started as an internal need for time series within Redis Labs
● Custom data structure in Redis
● Each key is a time series
● Built-in Redis facilities:
○ Persistence (AOF and RDB)
○ Replication
Differences between SortedSet and time series
● Ranged Aggregations are not possible in SortedSet
● Automatic downsampling and retention time
● Memory optimized
● Less traffic between client <-> Redis
Commands: TS.ADD
● Add a new sample:
TS.ADD KEY TIMESTAMP VALUE
● Usage:
$ TS.ADD temperature 1519820907 11.2
$ TS.ADD temperature 1519820910 14
$ TS.ADD temperature 1519820915 18.2
● Caveats:
○ You can’t add older samples than your latest timestamp
Commands: TS.INCRBY / TS.DECRBY
● Add a new sample:
TS.INCRBY KEY VALUE [RESET TIME_SECONDS]
● Usage:
$ TS.INCRBY active_machine 3
$ TS.DECRBY active_machine 1
$ TS.INCRBY active_users 1 RESET 5
● Caveats:
○ The timestamp that are used are based on the redis host clock
Command: TS.RANGE
● Range query over a specific key
TS.RANGE KEY [START TIMESTAMP] [END TIMESTAMP] [AGGREGATION TIME_BUCKET]
○ Aggregations are one of: avg, max, min, count, sum.
● Example:
$ TS.RANGE temperature 1519820907 1519830907
● Example - Aggregate to average of 5 mins (300 seconds):
$ TS.RANGE temperature 1519820907 1519830907 avg 300
Downsampling rules
● What is downsampling?
Raw Data
Downsampled and aggregated using average
Downsampling rules - cont’d
● Each key can have multiple rules
● Each rule contains:
○ Destination key
○ Aggregation type: avg, min, max, count, sum
○ Time bucket in seconds
● Example:
○ $ TS.CREATERULE temperature AVG 60 temperature_AVG_60
The destination key is temperature_AVG_60 will contain the aggregated data using average over 60
seconds bucket.
● Downsampling rules have very little performance impact
Global configuration
● Load the module with default configuration for downsampling rules
● Allows Calling to TS.ADD without creating the key
● Global Downsampling rules
○ Aggregation : time_bucket : retention time
○ Example: max:1m:1d;min:10s:1h;avg:2h:10d;avg:3d:100d
● Global retention policy per key
Recommended key topology
● Single key for data ingestion with a limited retention time
● Example: temp_sensor-43
● Downsmapling rules to multiple keys
● Each key should contain the original key, aggregation type, and bucket
size
● Examples:
temp_sensor-43_AVG_60
temp_sensor-43_MAX_60
Deep dive - How’s the data saved in memory?
Deep dive - chunks
● Samples are added to chunk until a chunk is
Full
● Chunks are dropped once all samples are
considered obsolete
● Chunks are not updated after they are full
Streams and time series
● Using Streams datastrcture instead of our memory model
● (Future) Data compression
● (Future) Using some parts of redis streams
QA
PRs are welcome
https://github.com/danni-m/redis-timeseries
Thank You

RedisConf18 - Redis as a time-series DB

  • 1.
    Redis as atime-series DB Danni Moiseyev, Redislabs
  • 2.
    What is atime series? ● What do you mean when you say time series? ● An ordered list of samples, each sample is a timestamp and value. ● Usually used for metrics ● Downsampled for longer retention time
  • 3.
    Unique features oftime series ● Queries are usually a range query over time ● Can be compressed by using known techniques ● Timestamps are easy to index ● Append data only
  • 4.
    Why store timeseries in Redis? ● People are already doing that using SortedSets ● Serving fast queries for time series ○ Dashboards ○ Mobile/IOT ○ Fast alerts on data ● Fast ingestion rate ○ use Redis as the first line metrics ingester ● Ease of deployment ○ Just run Redis with the module
  • 5.
    Using SortedSets ● SingleSortedSet key ● The score is the timestamp ● The values are unique data ● Unique data can be: ● A text blob containing the timestamp and some data ● A “reference” to another key with the actual data ● How do you query? 1. ZRANGEBYSCORE - get all related keys 2. Fetch all data from the keys
  • 6.
    Introducing redis-timeseries ● Startedas an internal need for time series within Redis Labs ● Custom data structure in Redis ● Each key is a time series ● Built-in Redis facilities: ○ Persistence (AOF and RDB) ○ Replication
  • 7.
    Differences between SortedSetand time series ● Ranged Aggregations are not possible in SortedSet ● Automatic downsampling and retention time ● Memory optimized ● Less traffic between client <-> Redis
  • 8.
    Commands: TS.ADD ● Adda new sample: TS.ADD KEY TIMESTAMP VALUE ● Usage: $ TS.ADD temperature 1519820907 11.2 $ TS.ADD temperature 1519820910 14 $ TS.ADD temperature 1519820915 18.2 ● Caveats: ○ You can’t add older samples than your latest timestamp
  • 9.
    Commands: TS.INCRBY /TS.DECRBY ● Add a new sample: TS.INCRBY KEY VALUE [RESET TIME_SECONDS] ● Usage: $ TS.INCRBY active_machine 3 $ TS.DECRBY active_machine 1 $ TS.INCRBY active_users 1 RESET 5 ● Caveats: ○ The timestamp that are used are based on the redis host clock
  • 10.
    Command: TS.RANGE ● Rangequery over a specific key TS.RANGE KEY [START TIMESTAMP] [END TIMESTAMP] [AGGREGATION TIME_BUCKET] ○ Aggregations are one of: avg, max, min, count, sum. ● Example: $ TS.RANGE temperature 1519820907 1519830907 ● Example - Aggregate to average of 5 mins (300 seconds): $ TS.RANGE temperature 1519820907 1519830907 avg 300
  • 11.
    Downsampling rules ● Whatis downsampling? Raw Data Downsampled and aggregated using average
  • 12.
    Downsampling rules -cont’d ● Each key can have multiple rules ● Each rule contains: ○ Destination key ○ Aggregation type: avg, min, max, count, sum ○ Time bucket in seconds ● Example: ○ $ TS.CREATERULE temperature AVG 60 temperature_AVG_60 The destination key is temperature_AVG_60 will contain the aggregated data using average over 60 seconds bucket. ● Downsampling rules have very little performance impact
  • 13.
    Global configuration ● Loadthe module with default configuration for downsampling rules ● Allows Calling to TS.ADD without creating the key ● Global Downsampling rules ○ Aggregation : time_bucket : retention time ○ Example: max:1m:1d;min:10s:1h;avg:2h:10d;avg:3d:100d ● Global retention policy per key
  • 14.
    Recommended key topology ●Single key for data ingestion with a limited retention time ● Example: temp_sensor-43 ● Downsmapling rules to multiple keys ● Each key should contain the original key, aggregation type, and bucket size ● Examples: temp_sensor-43_AVG_60 temp_sensor-43_MAX_60
  • 15.
    Deep dive -How’s the data saved in memory?
  • 16.
    Deep dive -chunks ● Samples are added to chunk until a chunk is Full ● Chunks are dropped once all samples are considered obsolete ● Chunks are not updated after they are full
  • 17.
    Streams and timeseries ● Using Streams datastrcture instead of our memory model ● (Future) Data compression ● (Future) Using some parts of redis streams
  • 18.
  • 19.

Editor's Notes

  • #3 This is just a simple ordered list, where each item is a two-tuple. EXAMPALE 7 SAMPLES Usually being used to keep sample data from external producers. When it used for metrics long-term storage is used by downsampling the data with a specific aggregation.
  • #4 I really like time series because if you think about it a very simple form of data The nature of this type of data is that usually is already sorted (assuming we work with near real-time data). Because we assume data is numbers only we can act in ways that other data series cant. Timestamps can be index as a Trie and order lexigraphically like antriez does.
  • #5 We are using it internally using LUA, sorted sets and hashtables Facebook wrote beringei just for this use case Dashboards are a crucial use case for many products. Serving data from disk today can be very expensive in terms of customer satisfaction. Aggregate internally using redis and dump the aggregated data to a secondary long-term and cheaper database. Embed redis with timeseries for PoC or embbeded solutions, you just need a redis-server and a shared object file.
  • #7 Internally we were looking for a database that will provide us with the replication, presistenance and sharding. With Redis custom data types you get that automatically. Having a key per time series helps a lot when it comes to replication and clustering, you just hash the key name.
  • #8 Ranged Aggregations are not possible in sortedset Automatic downsamlig and retention Memory optimization - no need to keep another key with the actual data, less data in the key Traffic is normally used to do a range query and then fetch the actual data.
  • #10 You can also call to TS.ADD without creating a time if you use global settings.
  • #11 Beta command, the name might change.
  • #13 Downsampling is a lossy compression. By aggregating time buckets using some kind of a function and saving only the aggregated value we can save a lot of sample. For example instead of keeping samples per 10 minute we aggregate to per hour using the average of that hour Downsampling can be evil, if you use it think carefully about you aggregation function, sometimes average is not what you are looking for.
  • #14 Performance note: the data is written only when the time bucket is closed. That means we do not allocate or open any resources for the aggregation calculation. We do that only when the time bucket for the aggregation context is ended.
  • #19 Streams designed and implmented for a general use case (events, logs and time series). Time series are a specialized stream, where all data is numeric. By leveraging the fact that all data is numeric we can Automatic Downsampling Aggregated queries (Future) Data compression (Future) Using some parts of redis streams