Redis Streams
@itamarhaber #Tech5 @Fiverr—FEBRUARY 2018
Who We Are
Open source. The leading in-memory database platform,
supporting any high performance operational, analytics or
hybrid use case.
The open source home and commercial provider of Redis
Enterprise (Redise
) technology, platform, products & services.
@itamarhaber, Technology Evangelisthello I am
Redis Streams
• 1st class Redis citizens
• An abstract data type that is not unlike a log
• Designed with time series data in mind
• Provide some "Kafkaesque" messaging abilities
This session is about
https://redis.io (emphasis for session context)
Redis is an open source (BSD licensed), in-memory data
structure store, used as a database, cache and message
broker. It supports data structures such as strings,
hashes, lists, sets, sorted sets with range queries,
bitmaps, hyperloglogs and geospatial indexes with
radius queries. Redis has built-in replication, Lua
scripting, LRU eviction, transactions and different levels
of on-disk persistence, and provides high availability via
Redis Sentinel and automatic partitioning with Redis
Cluster.
“
1. REmote DIctionary Server
2. / rɛdɪs/, pronounced “red-iss”
3. OSS (BSD3), https://github.com/antirez/redis
4. In-memory, but with optional disk persistence
5. By Salvatore Sanfilippo @antirez circa 27/2/09
6. DSL4ADT: A Domain Specific Language (DSL) for
Abstract Data Types (ADT)
7. Designed for performance and simplicity
Redis is
Necessity is the mother of invention
There ain't no such thing as a free lunch
The existing (i.e. lists, sorted sets, PubSub) isn't
"good enough" for things like:
• Log-like data patterns
• At-least-once messaging with fan-out
And listpacks, radix trees & reading Kafka :)
Why invent yet another Redis thingamajig?
“
“
A storage abstraction that is:
• Append-only, can be truncated
• A sequence of records ordered by time
A Logical Log is:
• Based on a logical offset, i.e. time (vs. bytes)
• Therefore time range queries
• Made up of in-memory data structures, naturally
The Log is (hardly a new thing)
A data stream is a sequence of elements. Consider:
• Real time sensor readings, e.g. particle colliders
• IoT, e.g. the irrigation of avocado groves
• User activity in an application
• …
• Messages in distributed systems
Logging streams of semi-structured data
A distributed system is a model in which
components located on networked computers
communicate and coordinate their actions by
passing messages
Distributed Computing, Wikipedia
Includes: client-server, 3/n-tier, peer to peer, SOA,
micro- & nanoservices, FaaS & serverless…
A side note about Distributed Systems
“
There are only two hard problems in
distributed systems:
2. Exactly-once delivery
1. Guaranteed order of messages
2. Exactly-once delivery
Mathias Verraes, on Twitter
An observation
“
Fact #1: you can choose one and only one:
• At-most-once delivery, i.e. "shoot and forget"
• At-least-once delivery, i.e. explicit ack
Fact #2: exactly-once delivery doesn't exist
Observation: order is usually important (duh)
Refresher on message delivery semantics
Consider the non-exhaustive list at taskqueues.com
• 17 message brokers, including: Apache Kafka,
NATS, RabbitMQ and Redis
• 17 queue solutions, including: Celery, Kue,
Laravel, Sidekiq, Resque and RQ <- all these use
Redis as their backend btw ;)
And that without considering protocol-based etc...
This isn't exactly a new challenge
Redis (in general and) Streams (in particular) are:
• Everywhere, from the IoT's edge to the cloud
• Blazing fast, massive throughput
• Usable from all(most) languages and platforms
(IoT microcontrollers included)
Note: apropos IoT, they are great async buffers
So again, why "reinvent hot water"?
A stream is a sequence of entries (records). It:
• Is "sharded" by key ("topic")
• Has 1+ producers
• Has 0+ consumers
• Can provide at-most- or at-least-once semantics
• Enables stream processing/real time pipelines
(as opposed to batch)
Redis Streams "formalism"
A picture of a stream
0 1 2 3 4 5 6 7 8 9 10 11 12 13
Producer
Consumer 1
position
Consumer 2
position
Next entry
("*")
Every entry has a unique ID that is its logical offset.
The ID is in following format:
<epoch-milliseconds>-<sequence>
Note: each ID part is a 64-bit unsigned integer
An entry also has one or more ordered field-value
pairs, allowing for total abstraction (the empty
string is a valid field name, good for time series).
Entries in the Stream
Streamz Demo
Or how I'm graphing my laptop's CPU and battery temperatures
using only bash, iStats, redis-cli, redis-server, docker, grafana &
a browser
https://github.com/itamarhaber/streams-cpubattmp
# Adding entries
redis> XADD <key> <* | id>
[MAXLEN [~] <n>]
<field> <value> [...]
<epoch-milliseconds>-<sequence>
# Stream length
redis> XLEN <key>
(integer) <stream-length>
# Iterating
redis> X[REV]RANGE <key>
<start> <stop>
[COUNT <n>]
1) 1) <entry-id>
2) 1) <field1>
2) <value1>
3) ...
# [Blocking] read
redis> XREAD [BLOCK <milliseconds>]
STREAMS <key> [...]
<start> [...]
1) 1) <entry-id>
2) 1) <field1>
2) <value1>
3) ...
# And the usual Redis goodness, e.g. TX
redis> MULTI
...
# Or server-side processing
redis> EVAL "return 'Lua Rocks!'" 0
...
# Or your own custom module
redis> MODULE LOAD <your-module-here>
OK
A consumer of a stream gets all entries in order,
and will eventually become a bottleneck.
Possible workarounds:
• Add a "type" field to each record - that's dumb
• Shard the stream to multiple keys - meh
• Have the consumer dispatch entries as jobs in
queues… GOTO 10
The problem with scaling consumers
… allow multiple consumers to cooperate in
processing messages arriving in a stream, so that
each consumers in a given group takes a subset
of the messages.
Shifts the complexity of recovering from consumer
failures and group management to the Redis server
Consumer Groups
“
We are here :)
• Groups are named and are explicitly (!) created:
XGROUP CREATE temps agg $
• Consumers are also named, and each gets only a
subset of the stream:
XREAD-GROUP GROUP agg CONSUMER
escher-01 STREAMS temps >
• XACK/NOACK in XREAD, XCLAIM, XPENDING...
Group orientation
Presently OSS Redis Streams are:
• Partially implemented
– Existing commands are relatively stable
– Some API corners still missing, e.g. XTRIM
– Consumer Groups are getting real
• A part of the unstable branch
• Expected to be GA as v5.0 during April 2018
Up to date status (Jan 26th)
• From your browser: https://try.redis.io
• Or download it: https://redis.io/download
• Or clone it: https://github.com/antirez/redis
• Or dockerize it: docker run -it redis
• Or try Redis Enterprise by https://redislabs.com
Next, try Redis yourself!
• The Redis Manifesto https://github.com/antirez/redis/blob/unstable/MANIFESTO
• Salvatore's blog posts http://antirez.com/news/114 and http://antirez.com/news/116
• Salvatore's Streams demo https://www.youtube.com/watch?v=ELDzy9lCFHQ
• RCP 11 - The stream data type https://github.com/redis/redis-rcp/blob/master/RCP11.md
• Reddit discussion
https://www.reddit.com/r/redis/comments/4mmrgr/stream_data_structure_for_redis_lets
_design_it/
• Hacker News discussion https://news.ycombinator.com/item?id=15384396
• Consumer groups specification
https://gist.github.com/antirez/68e67f3251d10f026861be2d0fe0d2f4
• Consumer groups API https://gist.github.com/antirez/4e7049ce4fce4aa61bf0cfbc3672e64d
& https://gist.github.com/antirez/4e7049ce4fce4aa61bf0cfbc3672e64d
• Redis Streams and the Unified Log https://brandur.org/redis-streams
• Introduction to Redis Streams
https://hackernoon.com/introduction-to-redis-streams-133f1c375cd3
References
Join us next month at
Redis Day Tel Aviv
Thank you,
Questions?

Redis Streams - Fiverr Tech5 meetup

  • 1.
    Redis Streams @itamarhaber #Tech5@Fiverr—FEBRUARY 2018
  • 2.
    Who We Are Opensource. The leading in-memory database platform, supporting any high performance operational, analytics or hybrid use case. The open source home and commercial provider of Redis Enterprise (Redise ) technology, platform, products & services. @itamarhaber, Technology Evangelisthello I am
  • 3.
    Redis Streams • 1stclass Redis citizens • An abstract data type that is not unlike a log • Designed with time series data in mind • Provide some "Kafkaesque" messaging abilities This session is about
  • 4.
    https://redis.io (emphasis forsession context) Redis is an open source (BSD licensed), in-memory data structure store, used as a database, cache and message broker. It supports data structures such as strings, hashes, lists, sets, sorted sets with range queries, bitmaps, hyperloglogs and geospatial indexes with radius queries. Redis has built-in replication, Lua scripting, LRU eviction, transactions and different levels of on-disk persistence, and provides high availability via Redis Sentinel and automatic partitioning with Redis Cluster. “
  • 5.
    1. REmote DIctionaryServer 2. / rɛdɪs/, pronounced “red-iss” 3. OSS (BSD3), https://github.com/antirez/redis 4. In-memory, but with optional disk persistence 5. By Salvatore Sanfilippo @antirez circa 27/2/09 6. DSL4ADT: A Domain Specific Language (DSL) for Abstract Data Types (ADT) 7. Designed for performance and simplicity Redis is
  • 6.
    Necessity is themother of invention There ain't no such thing as a free lunch The existing (i.e. lists, sorted sets, PubSub) isn't "good enough" for things like: • Log-like data patterns • At-least-once messaging with fan-out And listpacks, radix trees & reading Kafka :) Why invent yet another Redis thingamajig? “ “
  • 7.
    A storage abstractionthat is: • Append-only, can be truncated • A sequence of records ordered by time A Logical Log is: • Based on a logical offset, i.e. time (vs. bytes) • Therefore time range queries • Made up of in-memory data structures, naturally The Log is (hardly a new thing)
  • 8.
    A data streamis a sequence of elements. Consider: • Real time sensor readings, e.g. particle colliders • IoT, e.g. the irrigation of avocado groves • User activity in an application • … • Messages in distributed systems Logging streams of semi-structured data
  • 9.
    A distributed systemis a model in which components located on networked computers communicate and coordinate their actions by passing messages Distributed Computing, Wikipedia Includes: client-server, 3/n-tier, peer to peer, SOA, micro- & nanoservices, FaaS & serverless… A side note about Distributed Systems “
  • 10.
    There are onlytwo hard problems in distributed systems: 2. Exactly-once delivery 1. Guaranteed order of messages 2. Exactly-once delivery Mathias Verraes, on Twitter An observation “
  • 11.
    Fact #1: youcan choose one and only one: • At-most-once delivery, i.e. "shoot and forget" • At-least-once delivery, i.e. explicit ack Fact #2: exactly-once delivery doesn't exist Observation: order is usually important (duh) Refresher on message delivery semantics
  • 12.
    Consider the non-exhaustivelist at taskqueues.com • 17 message brokers, including: Apache Kafka, NATS, RabbitMQ and Redis • 17 queue solutions, including: Celery, Kue, Laravel, Sidekiq, Resque and RQ <- all these use Redis as their backend btw ;) And that without considering protocol-based etc... This isn't exactly a new challenge
  • 13.
    Redis (in generaland) Streams (in particular) are: • Everywhere, from the IoT's edge to the cloud • Blazing fast, massive throughput • Usable from all(most) languages and platforms (IoT microcontrollers included) Note: apropos IoT, they are great async buffers So again, why "reinvent hot water"?
  • 14.
    A stream isa sequence of entries (records). It: • Is "sharded" by key ("topic") • Has 1+ producers • Has 0+ consumers • Can provide at-most- or at-least-once semantics • Enables stream processing/real time pipelines (as opposed to batch) Redis Streams "formalism"
  • 15.
    A picture ofa stream 0 1 2 3 4 5 6 7 8 9 10 11 12 13 Producer Consumer 1 position Consumer 2 position Next entry ("*")
  • 16.
    Every entry hasa unique ID that is its logical offset. The ID is in following format: <epoch-milliseconds>-<sequence> Note: each ID part is a 64-bit unsigned integer An entry also has one or more ordered field-value pairs, allowing for total abstraction (the empty string is a valid field name, good for time series). Entries in the Stream
  • 17.
    Streamz Demo Or howI'm graphing my laptop's CPU and battery temperatures using only bash, iStats, redis-cli, redis-server, docker, grafana & a browser https://github.com/itamarhaber/streams-cpubattmp
  • 18.
    # Adding entries redis>XADD <key> <* | id> [MAXLEN [~] <n>] <field> <value> [...] <epoch-milliseconds>-<sequence> # Stream length redis> XLEN <key> (integer) <stream-length>
  • 19.
    # Iterating redis> X[REV]RANGE<key> <start> <stop> [COUNT <n>] 1) 1) <entry-id> 2) 1) <field1> 2) <value1> 3) ...
  • 20.
    # [Blocking] read redis>XREAD [BLOCK <milliseconds>] STREAMS <key> [...] <start> [...] 1) 1) <entry-id> 2) 1) <field1> 2) <value1> 3) ...
  • 21.
    # And theusual Redis goodness, e.g. TX redis> MULTI ... # Or server-side processing redis> EVAL "return 'Lua Rocks!'" 0 ... # Or your own custom module redis> MODULE LOAD <your-module-here> OK
  • 22.
    A consumer ofa stream gets all entries in order, and will eventually become a bottleneck. Possible workarounds: • Add a "type" field to each record - that's dumb • Shard the stream to multiple keys - meh • Have the consumer dispatch entries as jobs in queues… GOTO 10 The problem with scaling consumers
  • 23.
    … allow multipleconsumers to cooperate in processing messages arriving in a stream, so that each consumers in a given group takes a subset of the messages. Shifts the complexity of recovering from consumer failures and group management to the Redis server Consumer Groups “
  • 24.
    We are here:) • Groups are named and are explicitly (!) created: XGROUP CREATE temps agg $ • Consumers are also named, and each gets only a subset of the stream: XREAD-GROUP GROUP agg CONSUMER escher-01 STREAMS temps > • XACK/NOACK in XREAD, XCLAIM, XPENDING... Group orientation
  • 25.
    Presently OSS RedisStreams are: • Partially implemented – Existing commands are relatively stable – Some API corners still missing, e.g. XTRIM – Consumer Groups are getting real • A part of the unstable branch • Expected to be GA as v5.0 during April 2018 Up to date status (Jan 26th)
  • 26.
    • From yourbrowser: https://try.redis.io • Or download it: https://redis.io/download • Or clone it: https://github.com/antirez/redis • Or dockerize it: docker run -it redis • Or try Redis Enterprise by https://redislabs.com Next, try Redis yourself!
  • 27.
    • The RedisManifesto https://github.com/antirez/redis/blob/unstable/MANIFESTO • Salvatore's blog posts http://antirez.com/news/114 and http://antirez.com/news/116 • Salvatore's Streams demo https://www.youtube.com/watch?v=ELDzy9lCFHQ • RCP 11 - The stream data type https://github.com/redis/redis-rcp/blob/master/RCP11.md • Reddit discussion https://www.reddit.com/r/redis/comments/4mmrgr/stream_data_structure_for_redis_lets _design_it/ • Hacker News discussion https://news.ycombinator.com/item?id=15384396 • Consumer groups specification https://gist.github.com/antirez/68e67f3251d10f026861be2d0fe0d2f4 • Consumer groups API https://gist.github.com/antirez/4e7049ce4fce4aa61bf0cfbc3672e64d & https://gist.github.com/antirez/4e7049ce4fce4aa61bf0cfbc3672e64d • Redis Streams and the Unified Log https://brandur.org/redis-streams • Introduction to Redis Streams https://hackernoon.com/introduction-to-redis-streams-133f1c375cd3 References
  • 28.
    Join us nextmonth at Redis Day Tel Aviv Thank you, Questions?