Redis has become one of the critical tools in a Data Engineers toolkit. In this meetup we will take a gentle introduction to Redis, and also discuss some internals and usage patterns.
Redis. Seattle Data Science and Data Engineering Meetup
1. Seattle Data Science And Data Engineering Meetup
Abhishek Goswami.
11/29/2016
abgoswam@gmail.com
https://www.linkedin.com/in/abgoswam
2. Table Of Content
Introduction
What is Redis
Redis Features
Demos.
Internals
HyperLogLog
Pub / Sub
Persistence
Replication
Transactions
Source Code
Summary, Q&A
2
3. ● Introduction
○ What is Redis?
○ Redis Features
○ Demos
● Internals
● Summary, Q&A
3
4. Introduction: What is Redis?
4
Data Structures Server
Redis provides access to mutable data structures via a set of commands
Sent using a server-client model with TCP sockets and a simple protocol
5. Introduction: Redis Features
5 Data Types :
Strings
Hashes
Lists
Sets
Sorted Sets
5
Additional Functionality :
HyperLogLog
Pub / Sub
Persistence
Replication
Transactions
Scripting
Security
Special Properties
● Data Structures are served and modified into the server
memory. Redis also persists it on disk
○ This means that Redis is fast, but that is also non-
volatile.
● Data structures stress on memory efficiency
○ So data structures inside Redis will likely use less
memory compared to the same data structure modeled
using an high level programming language
● Replication, tunable levels of durability, cluster, high
availability
8. Introduction: HyperLogLog
An approximation algorithm.
Provides an approximation of the number of unique elements in a set
Features
Uses a very small amount of memory
Standard error of 0.81%
No limit to the number of items you can count, unless you approach 264 items
Trick : Uses randomization
References:
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.142.9475
https://www.linkedin.com/pulse/20141023230346-21057249-a-must-know-for-data-scientists-
hyperloglog-algorithm
http://blog.notdot.net/2012/09/Dam-Cool-Algorithms-Cardinality-Estimation
8
9. Introduction: Publish / Subscribe
Messaging System
Senders (in redis terminology called publishers) sends the messages while receivers
(subscribers) receive them
The link by which messages are transferred is called channel.
Couple of limitations
System Reliability (older versions)
Data Transmission Reliability
9
10. Internals: Persistence
10
In-memory data persisted to disk
Two different ways of persisting data to disk:
Snapshotting:
Take data as it exists at one moment and write it to disk
AOF (Append Only File)
Copy incoming write commands to disk as they happen
These methods can be used together, separately, or not at all in some circumstances.
11. Internals: Replication
11
Keep up-to-date copies of your data on additional machines
Other servers receive a continuously updated copy of the data as it’s being written
Two primary purposes of replication :
Availability
Replacement for Failed Master
Performance
Scalability of Reads
By combining replication and append-only files, we can configure Redis to be resilient against
system failures.
12. Internals: Transactions
12
Redis transactions allow the execution of a group of commands in a single step.
Two properties of Redis Transactions:
Sequential Execution
All commands in a transaction are sequentially executed as a single isolated operation. It is
not possible that a request issued by another client is served in the middle of the
execution of a Redis transaction
Atomic
Either all of the commands or none are processed.
13. Internals: Source Code
Github Repo
https://github.com/antirez/redis
Important Directories:
src: contains the Redis implementation, written in C.
tests: contains the unit tests, implemented in Tcl.
deps: contains libraries Redis uses.
Important Files:
server.h
server.c
networking.c
13
server.c
int main(int argc, char **argv) {
struct timeval tv;
int j;
...
initServer();
...
aeSetBeforeSleepProc(server.el,beforeSleep);
aeMain(server.el);
aeDeleteEventLoop(server.el);
return 0;
}
15. Summary
15
A tool to solve problems.
Thinking about data-driven problems moves from “How can I bend my idea to fit into the world of
tables and rows?” to “Which structures in Redis will result in an easier-to-maintain solution?”
Has some unique structures and functionality that no other database offers
Redis is in-memory (making it fast), remote (making it accessible to multiple clients/servers),
persistent (giving you the opportunity to keep data between reboots), and scalable (via slaving
and sharding).
Scaling any system can be a challenge.