Redis v5 & Streams

Redis TLV Meetup
v5 & Streams
@itamarhaber, October 2018

Redis v5
• Released on Wed Oct 17 13:28:26 CEST 2018
• 9.5 years after v1, 15 months of development
• Major feature: Streams
• Other stuff:
– Project Spartacus
– Active Defragmentation v2
– LOLWUT
– ZPOPMIN/MAX and their blocking variants
– Integrated help for subcommands

Georg Ness Schotter vs LOLWUT (performance)
©VictoriaandAlbertMuseum,London

ZPOP - youtube.com/watch?v=Xk4avdjdM-E
Sorted Sets now support List-like pop operations.
● ZPOPMIN - removes and returns the
lowest-ranking member
● ZPOPMAX - same, but for
highest-ranking
● BZPOPMIN, BZPOPMAX -
blocking variants

Integrated help for subcommands, e.g.:

A data stream is a sequence of elements. Consider:
• Real time sensor readings, e.g. particle colliders
• IoT, e.g. the irrigation of avocado groves
• User activity in an application
• …
• Messages in distributed systems
In the context of data processing...

… one in which the failure of a computer you
didn't even know existed can render your own
computer unusable.
Leslie Lamport
A distributed system is
“

... a model in which components located on
networked computers communicate and
coordinate their actions by passing messages
Distributed Computing, Wikipedia
Includes: client-server, 3/n-tier, peer to peer, SOA,
micro- & nanoservices, FaaS & serverless…
A distributed system is
“

There are only two hard problems in
distributed systems:
2. Exactly-once delivery
1. Guaranteed order of messages
2. Exactly-once delivery
Mathias Verraes, on Twitter
An observation
“

Fact #1: you can choose one and only one:
• At-most-once delivery, i.e. "fire and forget"
• At-least-once delivery, i.e. explicit
acknowledgement sent by the receiver
Fact #2: exactly-once delivery doesn't exist (unless
you change the definition of MSD)
Observation: order is usually important (duh)
Refresher on message delivery semantics

Consider the non-exhaustive list at taskqueues.com
• 17 message brokers, including: Apache Kafka,
NATS, RabbitMQ and Redis
• 17 queue solutions, including: Celery, Kue,
Laravel, Sidekiq, Resque and RQ (<- all these use
Redis as their backend btw ;))
And that's without considering protocol-based
architectures, legacy busses etc...
This isn't exactly a new challenge

The Log is a storage abstraction that is:
• Append-only, can be truncated
• A sequence of records ordered by time
A Logical Log is:
• Based on a logical offset, e.g. time (vs. bytes)
• (Therefore time range queries)
• Can be made up of data structures (vs. lines)
A Stream is not unlike a (logical) log

A Stream is (also) a storage abstraction, that is
basically an ordered, logical log of messages
(records). These messages are:
● Made up of:
○ Data payload (semi-structured usually)
○ Metadata (e.g. identifier)
● Immutable, once created
● Always added to the end of the stream
So what is a stream?

A producer is a software component that adds
messages (always to the end of) a stream.
A consumer is a software component that reads
messages from a stream (and acts on them). It can
start reading the messages from any arbitrary
offset, or just wait for new ones.
The Stream Players: producers and consumers

A picture I made of a stream
0 1 2 3 4 5 6 7 8 9 10 11 12 13
Producer
Consumer 1
position
Consumer 2
position
"Next"
message
Message 0

1. A component can both be a producer and a
consumer. Either for the same, or between
different streams. It depends.
2. Multiple stream producers can exist. At least one
is usually needed though.
3. A stream can exist without any consumers.
That's kind of pointless though.
Some observations

Consider the alternative, i.e. batch processing.
Besides fast response times (batch=1):
● Scalable (distributed) design
● Loose coupling of components and faults
● Enables building complex-er pipelines
Why (do architects) build stuff on streams?

… that architects like using, like:
● CQRS
● Event sourcing
● Unified (distributed commit) log
● Microservices
● ...
Also, they fit so well with other stuff...

An abstraction that is useful to work with (in
distributed systems).
Capable of trivially addressing message ordering.
Able to provide (depending on the implementation)
ATM and ALO MDS.
An enabler for Stream Processing (e.g. Spark
Streams, Kafka's Stream Processors).
The Stream (for architects) is

Necessity is the mother of invention
There ain't no such thing as a free lunch
The existing (i.e. lists, sorted sets, PubSub) isn't
"good enough" for things like:
• Log/time series-like data patterns
• At-least-once messaging with fan-out
Also Disque, listpacks, radix trees & reading Kafka :)
Why reinvent hot water (in Redis)?
“
“

● Sorted Sets? Memory hungry, no `BZMINPOP` (at
that time ;)), ordering depends on mutable score
and require uniqueness of elements
● Lists? Inefficient access (linear), index
(changeable)-based, and only have queue-like
blocking operations (single consumer)
● PubSub? Fan-out, sure, but only AMO MDS
How do you model "messages" in Redis < v5?

● A project by Salvatore Sanfilippo
● Like Redis, but is
● "a distributed, in-memory, message broker"
● Eventually consistent (AP in CAP terms)
● Last updated: Jan 2016
● Planned to: come back as a Redis module in v6
● Observation: A Stream "API" can also be built on
top of a message broker (see Kafka)
Interjection: What is Disque?

● A 1st-class citizen, a data structure like any other
● The most complex, implementation-wise
● Stores entries
● Is conceptually made up from 3 APIs:
a. Producer
b. Consumer
c. Consumers Group
What is the Redis Stream?

XADD key [MAXLEN ~ n] <ID | *>
<field> <string> [field string…]
> XADD mystream * foo bar baz qaz
1532556560197-42
Time complexity: O(log n)
See https://redis.io/commands/xadd
The Redis Stream Producer API

Every entry has a unique ID that is its logical offset.
The ID is in following format:
<epoch-milliseconds>-<sequence>
Each ID part is a 64-bit unsigned integer
Sequence is for ordering at millisecond scope
When user-provided, has to be bigger than latest.
When not, max(localtime, latest) is used.
The entry ID

Redis' Stream entries are made up of field-value
pairs. Like a Hash.
Unlike a Hash, repeating field names in consecutive
entries are compressed.
Values are not compressed. Yet.
(Time series engines often compress values, with
values being after all just numbers)
The entry itself

The `MAXLEN` subcommand is for that.
The `~` means about, less expensive to use.
The stream is capped by the number of entries.
Not by time frame.
Future regarding that is "yet unclear" - ref:
https://stackoverflow.com/questions/51168273/re
dis-stream-managing-a-time-frame
Side note: capped streams

XLEN key - does exactly that, not very interesting.
X[REV]RANGE key <start | -> < end | +>
[COUNT count] - much more interesting :)
Get a single entry (start = end = ID)
SCAN-like iteration on a stream (IDs inc.), but better
Range (timeframe) queries on a stream
What's in the stream "API"

> XRANGE mystream - +
1) 1) 1532556560197-0
2) 1) "foo"
2) "bar"
3) "baz"
4) "qaz"
A "real" picture of a stream

Yes. No. Maybe.
It can be used for consuming, but that requires the
client constantly polling the stream for new entries.
So generally, no. There's something better for
consumers.
Is X[REV]RANGE the Consumer API?

XREAD [COUNT count]
STREAMS key [key ...] ID [ID ...]
Somewhat like X[REV]RANGE, but:
● Supports multiple streams
● Easier to consume from an offset onwards
(compared to fetching ranges)
● But it is still polling, so...
The Redis Stream Consumer API

XREAD [COUNT count] [BLOCK ms]
● Like `BRPOP` (or `BZMINPOP` ;))
● Supports the special `$` ID, i.e. "new messages
since blockage"
● What about message delivery semantics?
The Redis Stream Consumer Blocking API

Like PubSub, it appears to "fire and forget", or
at-most-once delivery for efficient fan-out.
Contrastingly, messages in a stream are stored.
The consumer manages its last read ID, and can
resume from any point.
(And unlike blocking list (and zset :)) operations,
multiple consumers can consume the same stream)
XREAD [BLOCK] message delivery semantics …

A consumer of a stream gets all entries in order,
and will eventually become a bottleneck. Or fail.
Possible workarounds:
• Add a "type" field to each record - that's dumb
• Shard the stream to multiple keys - meh
• Have the consumer dispatch entries as jobs in
queues or messages in a … GOTO 10
The problem with scaling consumers

Consider the Stream.
There needs to be a way for constructing a
high-level/pseudo consumer, such that is made up
of multiple of its instances running in parallel, each
processing a mutually-exclusive subset of the
entries.
Another, high-level, perspective

… allow multiple consumers to cooperate in
processing messages arriving in a stream, so that
each consumers in a given group takes a subset
of the messages.
Shifts the complexity of recovering from consumer
failures and group management to the Redis server
Consumer Groups
“

A group picture (via @antirez)

1. Members are identified
2. New members get only undelivered messages
3. Each message is delivered to only one member
4. A member can only read its messages
5. A member must explicitly acknowledge the
receipt of messages
Observation: Big Brother (Redis) is observing you
Trivia: this is where most of the effort went into
(Consumer) Group membership rules

XREADGROUP GROUP
<groupname> <consumername>
[COUNT count] [BLOCK ms]
● consumername is the member's ID
● groupname is the name of the group
● The special `>` ID means "new messages", any
other ID returns the consumer's history
Consumers Group API, #1

XGROUP CREATE <key> <groupname>
<id or $> // Explicit creation!
// And key must exist
XGROUP SETID <key> <id or $>
XGROUP DESTROY <key> <groupname>
XGROUP DELCONSUMER <key>
<groupname> <consumername>

One of the internal data structures used.
Tracks which member saw which messages.
● When a new message is delivered, a new entry in
the list is created
● When an "old" message is delivered, the last
delivered timestamp and number of deliveries
counter (for it) are updated
The Pending Entries List (PEL) is

XACK <key> <group> <id> [<id> …]
Acknowledges the receipt of messages.
(that's at-least-once message delivery semantics)
Essentially removes them from the PEL.
Observation: consumername is not required, only
an ID, so anyone can `XACK` pending messages.

XPENDING <key> <group>
[<start> <stop> <count>]
[<consumer>]
XCLAIM <key> <group> <consumer>
<min-idle-time>
<id> [<id> …] [MOAR]
CG introspection & handling consumer failures.

XINFO <key>
XDEL <key> <id> [<id> …]
XTRIM <key> [MAXLEN [~] <n>]
Streams API, some loose ends

Definitive answer!
Mebbe.
K10xby!!one
Question?

• Introduction to Redis Streams https://redis.io/topics/streams-intro
• The Redis Manifesto https://github.com/antirez/redis/blob/unstable/MANIFESTO
• Salvatore's blog posts http://antirez.com/news/114 and http://antirez.com/news/116
• Salvatore's inaugural Streams demo https://www.youtube.com/watch?v=ELDzy9lCFHQ
• Salvatore's live demo at Redis Day Tel Aviv 2018
https://www.youtube.com/watch?v=qXEyuUxQXZM
• RCP 11 - The stream data type https://github.com/redis/redis-rcp/blob/master/RCP11.md
• Reddit discussion
https://www.reddit.com/r/redis/comments/4mmrgr/stream_data_structure_for_redis_lets
_design_it/
• Hacker News discussion https://news.ycombinator.com/item?id=15384396
• Consumer groups specification
https://gist.github.com/antirez/68e67f3251d10f026861be2d0fe0d2f4
• Consumer groups API https://gist.github.com/antirez/4e7049ce4fce4aa61bf0cfbc3672e64d
& https://gist.github.com/antirez/4e7049ce4fce4aa61bf0cfbc3672e64d
(some) Redis References

Redis v5 & Streams

More Related Content

What's hot

Similar to Redis v5 & Streams

More from Itamar Haber

Recently uploaded

Redis v5 & Streams