Extend Redis with Modules
Itamar Haber
2
Who We Are
The open source home and commercial provider
of Redis
Open source. The leading in-memory database
platform, supporting any high performance
OLTP or OLAP use case.
Chief Developer Advocate at Redis Labs
http://bit.ly/RedisWatch &&(curat||edit||janit||)
itamar@redislabs.com
@itamarhaber
3
“He who can, does;
He who cannot,
teaches.”
– Bernard Shaw
~10 Things About Redis
5
1.Redis: REmote DIctionary Server
2./ rɛdɪs/: “red-iss”
3.OSS: http://github.com/antirez/redis
4.3-clause BSD-license: http://redis.io
5.In-memory: (always) read from RAM
6.A database for: 5 data structures
7.And: 4 (+1) more specialized ones
6
8.Developed & maintained: (mostly)
Salvatore Sanfilippo (a.k.a. @antirez)
and his OSS team at @RedisLabs
9.Short history:v1.0 August 9th, 2009
… v3.2 May 6th, 2016
10.“The Leatherman™ of Databases”:
mostly used as a DB, cache & broker
7
11.A couple or so of extra features:
(a) atomicity; (b) blocking wait;
(c) configurable persistence;
(d) data expiration and (e) eviction;
as well as transactions, PubSub, Lua
scripts, high availability & clustering
12.Next version (v4.0): MODULES!
8
Why Redis
Simplicity VersatilityPerformance
“it is very fast”
Next 3 slides
+ ‘demo’
while(!eof)
9
Redis 101
1. Redis is “NoSQL”
0. No (explicit) schema, access by key
1. Key -> structure -> data
SIMPL-ICI-TY: simple, I see it, thank you
10
Redis data strata
v1.0 Strings
Lists
Sets
v1.2 Sorted Sets
v2.0 Hashes
v2.2 Bit arrays
v2.8.9 HyperLogLog
v3.2 Geo Sets
Bit fields
v4 Streams (?)
MODULES!
11
How to Redis in 3 steps:
1. 147 OSS clients in 49 languages, e.g:
Java, Node.js, .NET, Python, Ruby…
2. You make a request, i.e.:
PING
3. The server replies, i.e.g:
PONG
12
~$ redis-cli
127.0.0.1:6379> SET counter 1
OK
127.0.0.1:6379> GET counter
"1"
127.0.0.1:6379> INCRBY counter 1
(integer) 2
127.0.0.1:6379> APPEND counter b||!2b
(integer) 7
127.0.0.1:6379> GETSET counter "x00HelloxffWorld"
"2b||!2b"
127.0.0.1:6379>
The Evolution of Versatility
14
Flexibility: model (almost) anything
with basic “building blocks” and simple
rules (v0.0.1)
Composability: transactions (v1.2) and
server-embedded scripted logic (v2.6)
Extensibility: modules (v4) for adding
custom data structures and commands
MODULES! (a.k.a plugins)
16
First mentioned in release v1.0
https://groups.google.com/forum/#!msg/redis-db/Z0aiVSRAnRU/XezAFFtgyPUJ
“Another interesting idea is
to add support for plugins
implementing specific
commands and associated
data types, and the
embedding of a scripting
language.”
17
Redis before modules:
1. Redis is ubiquitous for fast data, fits
lots of cases (Swiss™ Army knife)
2. Some use cases need special care
3. Open source has its own agenda
So what can you do? FR, PR or fork
18
Redis with modules:
1. Core still fits lots of cases
2. Module extensions for special cases
3. A new community-driven ecosystem
4. “Give power to users to go faster”
What to expect? Nothing’s impossible!
19
Redis modules are:
1. Dynamically (server-)loaded libraries
2. Future-compatible
3. (will be mostly) written in C
4. (nearly) as fast as the core
5. Planned for public release Q3 2016
20
Modules let you:
1. Process: where the data is at
2. Compose: call core & other modules
3. Extend: new structures, commands
4. (planned) Time & keyspace triggers
5. (also) Blocking custom commands
6. (and) Cross-cluster parallelization
21
redis> ECHO "Alpha"
"Alpha"
redis> MODULE LOAD example.so
OK
redis> EXAMPLE.ECHO "Bravo"
"Bravo"
redis> ^C
~$ wc example.c
13 46 520 example.c
~$ gcc -fPIC -std=gnu99 -c -o example.o example.c
~$ ld -o example.so example.o -shared -Bsymbolic -lc
core command
module library
“new” command
Redis Modules API
23
The API
1. Where most of the effort was made
2. Abstracts & isolates Redis’ internals
3. The server’s (C-) binding contract
4. Will not be broken once released
5. Exposes three conceptual layers
24
Modules API layers
1.Operational: admin, memory, disk,
replication, arguments, replies…
2.High-level: client-like access to core
and modules’ commands
3.Low-level: (almost) native access to
core data structures memory
~$ cat example.c: operational-API-only example
26
#include "redismodule.h"
int Echo(RedisModuleCtx *ctx,
RedisModuleString **argv, int argc) {
if (argc != 2) return RedisModule_WrongArity(ctx);
return RedisModule_ReplyWithString(ctx,argv[1]); }
int RedisModule_OnLoad(RedisModuleCtx *ctx) {
if (RedisModule_Init(ctx, "example", 1,
REDISMODULE_APIVER_1) == REDISMODULE_ERR)
return REDISMODULE_ERR;
if (RedisModule_CreateCommand(ctx, "example.echo",
Echo, "readonly", 1, 1, 1) == REDISMODULE_ERR)
return REDISMODULE_ERR;
return REDISMODULE_OK; }
27
#include "redismodule.h"
int RedisModule_OnLoad(RedisModuleCtx *ctx) {
MUST:
API definitions
MUST:
is called when
module is loaded
pointer to
context
28
RedisModuleCtx *ctx
1.The module’s call execution context
2.Used by most calls to the API, just
pass it along
3.A black box: internal housekeeping
structure for tracking memory
allocations, objects, opened keys…
29
if (RedisModule_Init(ctx, "example", 1,
REDISMODULE_APIVER_1) == REDISMODULE_ERR)
return REDISMODULE_ERR;
if (RedisModule_CreateCommand(ctx, "example.echo",
Echo, "readonly", 1, 1, 1) == REDISMODULE_ERR)
return REDISMODULE_ERR;
register the
command
register the module
or die trying
30
int Echo(RedisModuleCtx *ctx,
RedisModuleString **argv, int argc)
if (argc != 2) return RedisModule_WrongArity(ctx);
return RedisModule_ReplyWithString(ctx,argv[1]);
validate number
of arguments
&err if needed
arguments
&count
send back
the argument
31
RedisModule_ReplyWith
• Error – duh
• Null – no words
• LongLong – integer
• String – also Simple and Buffer
• Array – Redis array (can be nested)
• CallReply – High-Level API reply
High-Level API
33
RedisModule_Call(…)
• Does: runs a command
• Expects: context, command name,
printf-like format and arguments
• Returns: RedisModuleCallReply *
• Not unlike: Redis’ Lua redis.call
35
int Educational_HighLevelAPI_Echo(RedisModuleCtx *ctx,
RedisModuleString **argv, int argc) {
if (argc != 2) return RedisModule_WrongArity(ctx);
RedisModule_AutoMemory(ctx);
RedisModuleCallReply *rep = RedisModule_Call(ctx,
"ECHO", "s", argv[1]);
return RedisModule_ReplyWithCallReply(ctx, rep);
}
Using the High-Level API to call
the Redis core ‘ECHO’ command...
...is impractical but educational :)
36
RedisModule_AutoMemory(…)
Automagically manages memory
• RedisModuleCallReply *
• RedisModuleString *
• RedisModuleKey *
• RedisModule_Alloc() and family
High-Level Visualization Of The Low-Level API
38
user app
Redis client
Redis
core
data
GET foo
"bar"
101010
010101
101010
39
user
101010
010101
101010
High
level
API
app
module
40
user
101010
010101
101010
app
Low
level
API
41
With the low-level API you can:
• Manage keys: open, close, type,
length, get/set TTL, delete…
• Manipulate core data structures:
e.g. RedisModule_StringSet(…),
RedisModule_ListPop(…) and
RedisModule_Zset*Range(…)
42
• Fine tune replication:
RedisModule_Replicate*(…)
• Directly access String memory:
RedisModule_StringDMA(…)
• Register custom data types:
RedisModule_CreateDataType(…)
• And much more but…
43
Build it
• Get Redis unstable version
• Read the docs
• You can also use the Redis Labs
Modules SDK to jumpstart:
https://github.com/RedisLabs/Redis
ModulesSDK
The Benchmark (Why Bother with Modules?)
45
1.2 1.25
1.05
0.1
seconds
Time needed
for summing
1,000,000
Sorted Set
scores Python
(local)
Lua API
high low
46
On average
about 63.79%
of all statistics
are made up
Probabilistic Data Structures (PDSs)
48
There are three kinds
of people in the world;
those who can count
and those who can’t.
49
There are three kinds
of data structures…
…and those who both
can count and can’t.
50
Data Structures of the 3rd kind
• Why: accuracy is (in theory) possible
but scale makes it (nearly) impossible
• Example: number of unique visitors
• Alternative: estimate the answer
• Data structure: the HyperLogLog
• Ergo: modules as models for PDSs
51
The “good” PDSs are
1. Efficient: sublinear space-time
2. Accurate: within their parameters
3. Scalable: by merging et al.
4. Suspiciously not unlike: the Infinite
Improbability Drive (The Hitch Hiker
Guide to the Galaxy, Adams D.)
52
Top-K - k most frequent samples
The entire algorithm:
1. Maintain set S of k counters
2. For each sample s:
2.1 If s exists in S, increment S(x)
2.1 Otherwise, if there’s space add x
to S , else decrement all counters
53
Modeling Top-K with Redis
1. Sorted Set -> unique members
2. Member -> element and score
3. ZSCORE: O(1) membership
4. ZADD: O(Log(N)) write
5. ZRANGEBYSCORE: O(Log(N)) seek
Can this be moduled?
54
redis> TOPK.ADD tk 2 a
(integer) 1
redis> TOPK.ADD tk 2 b
(integer) 1
redis> TOPK.ADD tk 2 b
(integer) 0
redis> ZRANGE tk 1 -1 WITHSCORES
1) "a"
2) "1"
3) "b"
4) "2"
redis> TOPK.ADD tk 2 c
(integer) -1
max items (a.k.a k)
the sample
score
1 means added
0 is freq. incr.
indicates eviction
55
redis> ZRANGE tk 1 -1 WITHSCORES
1) "b"
2) "2"
3) "c"
4) "2"
redis> TOPK.ADD tk 2 c
(integer) 0
redis> ZRANGE tk 1 -1 WITHSCORES
1) "b"
2) "2"
3) "c"
4) "3"
a evicted, c added
b’s and c’s score = 2
(global offset = -1)
56
topk Redis Module
1.Optimization: a global score offset
2.Eviction: reservoir sampling
3.TOPK.PRANK: percentile rank
4.TOPK.PRANGE: percentile range
5.Where: Redis Module Hub/topk
57
Bloom filter – set membership
1.Answers: “have I seen this?”
2.Good for: avoiding hard work
3.Promises: no false negatives
4.Sometimes: false positives (error)
5.Gist: hash values of the samples are
indexes in an array of counters
58
redis> CBF.ADD bloom a
(integer) 1
redis> CBF.ADD bloom b
(integer) 2
redis> CBF.CHECK bloom a
(integer) 1
redis> CBF.CHECK bloom b
(integer) 1
redis> CBF.CHECK bloom c
(integer) 0
0 1 0 21 0
h1(a), h2(a)
h1(b), h2(b)h1(c), h2(c)
59
“Good coders code,
great coders reuse.”
– Peteris Krumins
60
“Good programmers know
what to write. Great ones
know what to rewrite (and
reuse)”– Eric S. Raymond
61
git clone bit.ly/dablooms
62
redablooms Redis Module
1.Error rate: defaults to %5
2.Counting: 4-bit registers, allows
removing samples, default capacity is
100,000 samples
3.Scalable: multiple filters layered
4.Redis Module Hub/redablooms
63
Count Min Sketch - item counts
1.Unlike Top-K:
answers about any sample
2.WRT Bloom filters
Like: hashes as indexes to counters
Unlike: array per hash function,
returns the minimum of counters
64
redis> CMS.INCRBY count a 1 b 2
OK
redis> CMS.QUERY count b
(integer) 2
0 1 0 00 2 h1
0 0 0 03 0 h2
collision
min[h1(b), h2(b)]
hi(b) hi(b)
65
countminsketch Redis Module
1.Registers width: 16-bit
2.Default maximum error: %0.01
3.Default error probability: %0.01
4.Redis Module Hub/countminsketch
66
tdigest Redis Module
1.Purpose: streaming quantiles
2.Beauty: tiny, fast and parallelizable
3.Award: 1st community data type
4.Author: Usman Masood
5.Redis Module Hub/tdigest
67
redismodules.com: Redis Module Hub
68
What Is The Hub
1.Modules developed by: anyone
2.Certified by: Redis Labs
3.Licenses: Open Source & Commercial
4.(will be) Distributed via: Redis Cloud
and Redis Labs Enterprise Cluster
5.Where: redismodules.com
Thank you
Further Reading
71
1. The Redis Open Source Project Website – http://redis.io
2. Redis source code on GitHub – http://github.com/antirez/redis
3. Getting started:
1. An introduction to Redis data types and abstractions –
http://redis.io/topics/data-types-intro
2. Try Redis (in your browser) – http://try.redis.io
3. Karl Seguin’s The Little Redis Book –
http://openmymind.net/2012/1/23/The-Little-Redis-Book/
4. Josiah Carlson’s Redis In Action – https://redislabs.com/ebook/redis-in-
action
4. Redis documentation – http://redis.io/documentation
5. Redis commands – http://redis.io/commands
6. Redis community – http://redis.io/community
7. Redis Watch newsletter – https://redislabs.com/redis-watch-archive
72
8. STREAM data structure for Redis: let's design it together! –
https://www.reddit.com/r/redis/comments/4mmrgr/stream_data_st
ructure_for_redis_lets_design_it/
9. Redis Loadable Modules System – http://antirez.com/news/106
10. Introduction to Redis Modules API –
https://github.com/antirez/redis/blob/unstable/src/modules/INTRO.
md
11. Redis Modules API reference –
https://github.com/antirez/redis/blob/unstable/src/modules/API.md
12. Creating a redis Module in 15 lines of code! –
https://gist.github.com/dvirsky/83fc32366d5ad82fc3dca47ed270437
7
73
13. Infinite Improbability Drive –
https://en.wikipedia.org/wiki/Technology_in_The_Hitchhiker%27s_G
uide_to_the_Galaxy#Infinite_Improbability_Drive
14. Streaming Algorithms: Frequent Items –
https://people.eecs.berkeley.edu/~satishr/cs270/sp11/rough-
notes/Streaming-two.pdf
15. Space/Time Trade-offs in Hash Coding with Allowable Errors –
http://dmod.eu/deca/ft_gateway.cfm.pdf
16. Approximating Data with the Count-Min Data Structure –
http://dimacs.rutgers.edu/~graham/pubs/papers/cmsoft.pdf
17. Computing Extremely Accurate Quantiles Using T-Digests –
https://github.com/tdunning/t-digest/blob/master/docs/t-digest-
paper/histo.pdf

Extend Redis with Modules

  • 1.
    Extend Redis withModules Itamar Haber
  • 2.
    2 Who We Are Theopen source home and commercial provider of Redis Open source. The leading in-memory database platform, supporting any high performance OLTP or OLAP use case. Chief Developer Advocate at Redis Labs http://bit.ly/RedisWatch &&(curat||edit||janit||) itamar@redislabs.com @itamarhaber
  • 3.
    3 “He who can,does; He who cannot, teaches.” – Bernard Shaw
  • 4.
  • 5.
    5 1.Redis: REmote DIctionaryServer 2./ rɛdɪs/: “red-iss” 3.OSS: http://github.com/antirez/redis 4.3-clause BSD-license: http://redis.io 5.In-memory: (always) read from RAM 6.A database for: 5 data structures 7.And: 4 (+1) more specialized ones
  • 6.
    6 8.Developed & maintained:(mostly) Salvatore Sanfilippo (a.k.a. @antirez) and his OSS team at @RedisLabs 9.Short history:v1.0 August 9th, 2009 … v3.2 May 6th, 2016 10.“The Leatherman™ of Databases”: mostly used as a DB, cache & broker
  • 7.
    7 11.A couple orso of extra features: (a) atomicity; (b) blocking wait; (c) configurable persistence; (d) data expiration and (e) eviction; as well as transactions, PubSub, Lua scripts, high availability & clustering 12.Next version (v4.0): MODULES!
  • 8.
    8 Why Redis Simplicity VersatilityPerformance “itis very fast” Next 3 slides + ‘demo’ while(!eof)
  • 9.
    9 Redis 101 1. Redisis “NoSQL” 0. No (explicit) schema, access by key 1. Key -> structure -> data SIMPL-ICI-TY: simple, I see it, thank you
  • 10.
    10 Redis data strata v1.0Strings Lists Sets v1.2 Sorted Sets v2.0 Hashes v2.2 Bit arrays v2.8.9 HyperLogLog v3.2 Geo Sets Bit fields v4 Streams (?) MODULES!
  • 11.
    11 How to Redisin 3 steps: 1. 147 OSS clients in 49 languages, e.g: Java, Node.js, .NET, Python, Ruby… 2. You make a request, i.e.: PING 3. The server replies, i.e.g: PONG
  • 12.
    12 ~$ redis-cli 127.0.0.1:6379> SETcounter 1 OK 127.0.0.1:6379> GET counter "1" 127.0.0.1:6379> INCRBY counter 1 (integer) 2 127.0.0.1:6379> APPEND counter b||!2b (integer) 7 127.0.0.1:6379> GETSET counter "x00HelloxffWorld" "2b||!2b" 127.0.0.1:6379>
  • 13.
    The Evolution ofVersatility
  • 14.
    14 Flexibility: model (almost)anything with basic “building blocks” and simple rules (v0.0.1) Composability: transactions (v1.2) and server-embedded scripted logic (v2.6) Extensibility: modules (v4) for adding custom data structures and commands
  • 15.
  • 16.
    16 First mentioned inrelease v1.0 https://groups.google.com/forum/#!msg/redis-db/Z0aiVSRAnRU/XezAFFtgyPUJ “Another interesting idea is to add support for plugins implementing specific commands and associated data types, and the embedding of a scripting language.”
  • 17.
    17 Redis before modules: 1.Redis is ubiquitous for fast data, fits lots of cases (Swiss™ Army knife) 2. Some use cases need special care 3. Open source has its own agenda So what can you do? FR, PR or fork
  • 18.
    18 Redis with modules: 1.Core still fits lots of cases 2. Module extensions for special cases 3. A new community-driven ecosystem 4. “Give power to users to go faster” What to expect? Nothing’s impossible!
  • 19.
    19 Redis modules are: 1.Dynamically (server-)loaded libraries 2. Future-compatible 3. (will be mostly) written in C 4. (nearly) as fast as the core 5. Planned for public release Q3 2016
  • 20.
    20 Modules let you: 1.Process: where the data is at 2. Compose: call core & other modules 3. Extend: new structures, commands 4. (planned) Time & keyspace triggers 5. (also) Blocking custom commands 6. (and) Cross-cluster parallelization
  • 21.
    21 redis> ECHO "Alpha" "Alpha" redis>MODULE LOAD example.so OK redis> EXAMPLE.ECHO "Bravo" "Bravo" redis> ^C ~$ wc example.c 13 46 520 example.c ~$ gcc -fPIC -std=gnu99 -c -o example.o example.c ~$ ld -o example.so example.o -shared -Bsymbolic -lc core command module library “new” command
  • 22.
  • 23.
    23 The API 1. Wheremost of the effort was made 2. Abstracts & isolates Redis’ internals 3. The server’s (C-) binding contract 4. Will not be broken once released 5. Exposes three conceptual layers
  • 24.
    24 Modules API layers 1.Operational:admin, memory, disk, replication, arguments, replies… 2.High-level: client-like access to core and modules’ commands 3.Low-level: (almost) native access to core data structures memory
  • 25.
    ~$ cat example.c:operational-API-only example
  • 26.
    26 #include "redismodule.h" int Echo(RedisModuleCtx*ctx, RedisModuleString **argv, int argc) { if (argc != 2) return RedisModule_WrongArity(ctx); return RedisModule_ReplyWithString(ctx,argv[1]); } int RedisModule_OnLoad(RedisModuleCtx *ctx) { if (RedisModule_Init(ctx, "example", 1, REDISMODULE_APIVER_1) == REDISMODULE_ERR) return REDISMODULE_ERR; if (RedisModule_CreateCommand(ctx, "example.echo", Echo, "readonly", 1, 1, 1) == REDISMODULE_ERR) return REDISMODULE_ERR; return REDISMODULE_OK; }
  • 27.
    27 #include "redismodule.h" int RedisModule_OnLoad(RedisModuleCtx*ctx) { MUST: API definitions MUST: is called when module is loaded pointer to context
  • 28.
    28 RedisModuleCtx *ctx 1.The module’scall execution context 2.Used by most calls to the API, just pass it along 3.A black box: internal housekeeping structure for tracking memory allocations, objects, opened keys…
  • 29.
    29 if (RedisModule_Init(ctx, "example",1, REDISMODULE_APIVER_1) == REDISMODULE_ERR) return REDISMODULE_ERR; if (RedisModule_CreateCommand(ctx, "example.echo", Echo, "readonly", 1, 1, 1) == REDISMODULE_ERR) return REDISMODULE_ERR; register the command register the module or die trying
  • 30.
    30 int Echo(RedisModuleCtx *ctx, RedisModuleString**argv, int argc) if (argc != 2) return RedisModule_WrongArity(ctx); return RedisModule_ReplyWithString(ctx,argv[1]); validate number of arguments &err if needed arguments &count send back the argument
  • 31.
    31 RedisModule_ReplyWith • Error –duh • Null – no words • LongLong – integer • String – also Simple and Buffer • Array – Redis array (can be nested) • CallReply – High-Level API reply
  • 32.
  • 33.
    33 RedisModule_Call(…) • Does: runsa command • Expects: context, command name, printf-like format and arguments • Returns: RedisModuleCallReply * • Not unlike: Redis’ Lua redis.call
  • 34.
    35 int Educational_HighLevelAPI_Echo(RedisModuleCtx *ctx, RedisModuleString**argv, int argc) { if (argc != 2) return RedisModule_WrongArity(ctx); RedisModule_AutoMemory(ctx); RedisModuleCallReply *rep = RedisModule_Call(ctx, "ECHO", "s", argv[1]); return RedisModule_ReplyWithCallReply(ctx, rep); } Using the High-Level API to call the Redis core ‘ECHO’ command... ...is impractical but educational :)
  • 35.
    36 RedisModule_AutoMemory(…) Automagically manages memory •RedisModuleCallReply * • RedisModuleString * • RedisModuleKey * • RedisModule_Alloc() and family
  • 36.
    High-Level Visualization OfThe Low-Level API
  • 37.
    38 user app Redis client Redis core data GETfoo "bar" 101010 010101 101010
  • 38.
  • 39.
  • 40.
    41 With the low-levelAPI you can: • Manage keys: open, close, type, length, get/set TTL, delete… • Manipulate core data structures: e.g. RedisModule_StringSet(…), RedisModule_ListPop(…) and RedisModule_Zset*Range(…)
  • 41.
    42 • Fine tunereplication: RedisModule_Replicate*(…) • Directly access String memory: RedisModule_StringDMA(…) • Register custom data types: RedisModule_CreateDataType(…) • And much more but…
  • 42.
    43 Build it • GetRedis unstable version • Read the docs • You can also use the Redis Labs Modules SDK to jumpstart: https://github.com/RedisLabs/Redis ModulesSDK
  • 43.
    The Benchmark (WhyBother with Modules?)
  • 44.
    45 1.2 1.25 1.05 0.1 seconds Time needed forsumming 1,000,000 Sorted Set scores Python (local) Lua API high low
  • 45.
    46 On average about 63.79% ofall statistics are made up
  • 46.
  • 47.
    48 There are threekinds of people in the world; those who can count and those who can’t.
  • 48.
    49 There are threekinds of data structures… …and those who both can count and can’t.
  • 49.
    50 Data Structures ofthe 3rd kind • Why: accuracy is (in theory) possible but scale makes it (nearly) impossible • Example: number of unique visitors • Alternative: estimate the answer • Data structure: the HyperLogLog • Ergo: modules as models for PDSs
  • 50.
    51 The “good” PDSsare 1. Efficient: sublinear space-time 2. Accurate: within their parameters 3. Scalable: by merging et al. 4. Suspiciously not unlike: the Infinite Improbability Drive (The Hitch Hiker Guide to the Galaxy, Adams D.)
  • 51.
    52 Top-K - kmost frequent samples The entire algorithm: 1. Maintain set S of k counters 2. For each sample s: 2.1 If s exists in S, increment S(x) 2.1 Otherwise, if there’s space add x to S , else decrement all counters
  • 52.
    53 Modeling Top-K withRedis 1. Sorted Set -> unique members 2. Member -> element and score 3. ZSCORE: O(1) membership 4. ZADD: O(Log(N)) write 5. ZRANGEBYSCORE: O(Log(N)) seek Can this be moduled?
  • 53.
    54 redis> TOPK.ADD tk2 a (integer) 1 redis> TOPK.ADD tk 2 b (integer) 1 redis> TOPK.ADD tk 2 b (integer) 0 redis> ZRANGE tk 1 -1 WITHSCORES 1) "a" 2) "1" 3) "b" 4) "2" redis> TOPK.ADD tk 2 c (integer) -1 max items (a.k.a k) the sample score 1 means added 0 is freq. incr. indicates eviction
  • 54.
    55 redis> ZRANGE tk1 -1 WITHSCORES 1) "b" 2) "2" 3) "c" 4) "2" redis> TOPK.ADD tk 2 c (integer) 0 redis> ZRANGE tk 1 -1 WITHSCORES 1) "b" 2) "2" 3) "c" 4) "3" a evicted, c added b’s and c’s score = 2 (global offset = -1)
  • 55.
    56 topk Redis Module 1.Optimization:a global score offset 2.Eviction: reservoir sampling 3.TOPK.PRANK: percentile rank 4.TOPK.PRANGE: percentile range 5.Where: Redis Module Hub/topk
  • 56.
    57 Bloom filter –set membership 1.Answers: “have I seen this?” 2.Good for: avoiding hard work 3.Promises: no false negatives 4.Sometimes: false positives (error) 5.Gist: hash values of the samples are indexes in an array of counters
  • 57.
    58 redis> CBF.ADD blooma (integer) 1 redis> CBF.ADD bloom b (integer) 2 redis> CBF.CHECK bloom a (integer) 1 redis> CBF.CHECK bloom b (integer) 1 redis> CBF.CHECK bloom c (integer) 0 0 1 0 21 0 h1(a), h2(a) h1(b), h2(b)h1(c), h2(c)
  • 58.
    59 “Good coders code, greatcoders reuse.” – Peteris Krumins
  • 59.
    60 “Good programmers know whatto write. Great ones know what to rewrite (and reuse)”– Eric S. Raymond
  • 60.
  • 61.
    62 redablooms Redis Module 1.Errorrate: defaults to %5 2.Counting: 4-bit registers, allows removing samples, default capacity is 100,000 samples 3.Scalable: multiple filters layered 4.Redis Module Hub/redablooms
  • 62.
    63 Count Min Sketch- item counts 1.Unlike Top-K: answers about any sample 2.WRT Bloom filters Like: hashes as indexes to counters Unlike: array per hash function, returns the minimum of counters
  • 63.
    64 redis> CMS.INCRBY counta 1 b 2 OK redis> CMS.QUERY count b (integer) 2 0 1 0 00 2 h1 0 0 0 03 0 h2 collision min[h1(b), h2(b)] hi(b) hi(b)
  • 64.
    65 countminsketch Redis Module 1.Registerswidth: 16-bit 2.Default maximum error: %0.01 3.Default error probability: %0.01 4.Redis Module Hub/countminsketch
  • 65.
    66 tdigest Redis Module 1.Purpose:streaming quantiles 2.Beauty: tiny, fast and parallelizable 3.Award: 1st community data type 4.Author: Usman Masood 5.Redis Module Hub/tdigest
  • 66.
  • 67.
    68 What Is TheHub 1.Modules developed by: anyone 2.Certified by: Redis Labs 3.Licenses: Open Source & Commercial 4.(will be) Distributed via: Redis Cloud and Redis Labs Enterprise Cluster 5.Where: redismodules.com
  • 68.
  • 69.
  • 70.
    71 1. The RedisOpen Source Project Website – http://redis.io 2. Redis source code on GitHub – http://github.com/antirez/redis 3. Getting started: 1. An introduction to Redis data types and abstractions – http://redis.io/topics/data-types-intro 2. Try Redis (in your browser) – http://try.redis.io 3. Karl Seguin’s The Little Redis Book – http://openmymind.net/2012/1/23/The-Little-Redis-Book/ 4. Josiah Carlson’s Redis In Action – https://redislabs.com/ebook/redis-in- action 4. Redis documentation – http://redis.io/documentation 5. Redis commands – http://redis.io/commands 6. Redis community – http://redis.io/community 7. Redis Watch newsletter – https://redislabs.com/redis-watch-archive
  • 71.
    72 8. STREAM datastructure for Redis: let's design it together! – https://www.reddit.com/r/redis/comments/4mmrgr/stream_data_st ructure_for_redis_lets_design_it/ 9. Redis Loadable Modules System – http://antirez.com/news/106 10. Introduction to Redis Modules API – https://github.com/antirez/redis/blob/unstable/src/modules/INTRO. md 11. Redis Modules API reference – https://github.com/antirez/redis/blob/unstable/src/modules/API.md 12. Creating a redis Module in 15 lines of code! – https://gist.github.com/dvirsky/83fc32366d5ad82fc3dca47ed270437 7
  • 72.
    73 13. Infinite ImprobabilityDrive – https://en.wikipedia.org/wiki/Technology_in_The_Hitchhiker%27s_G uide_to_the_Galaxy#Infinite_Improbability_Drive 14. Streaming Algorithms: Frequent Items – https://people.eecs.berkeley.edu/~satishr/cs270/sp11/rough- notes/Streaming-two.pdf 15. Space/Time Trade-offs in Hash Coding with Allowable Errors – http://dmod.eu/deca/ft_gateway.cfm.pdf 16. Approximating Data with the Count-Min Data Structure – http://dimacs.rutgers.edu/~graham/pubs/papers/cmsoft.pdf 17. Computing Extremely Accurate Quantiles Using T-Digests – https://github.com/tdunning/t-digest/blob/master/docs/t-digest- paper/histo.pdf