Seattle Data Science And Data Engineering Meetup
Abhishek Goswami.
11/29/2016
abgoswam@gmail.com
https://www.linkedin.com/in/abgoswam
Table Of Content
Introduction
What is Redis
Redis Features
Demos.
Internals
HyperLogLog
Pub / Sub
Persistence
Replication
Transactions
Source Code
Summary, Q&A
2
● Introduction
○ What is Redis?
○ Redis Features
○ Demos
● Internals
● Summary, Q&A
3
Introduction: What is Redis?
4
Data Structures Server
Redis provides access to mutable data structures via a set of commands
Sent using a server-client model with TCP sockets and a simple protocol
Introduction: Redis Features
5 Data Types :
Strings
Hashes
Lists
Sets
Sorted Sets
5
Additional Functionality :
HyperLogLog
Pub / Sub
Persistence
Replication
Transactions
Scripting
Security
Special Properties
● Data Structures are served and modified into the server
memory. Redis also persists it on disk
○ This means that Redis is fast, but that is also non-
volatile.
● Data structures stress on memory efficiency
○ So data structures inside Redis will likely use less
memory compared to the same data structure modeled
using an high level programming language
● Replication, tunable levels of durability, cluster, high
availability
Introduction: Demos
6
● Introduction
● Internals
○ HyperLogLog
○ Pub / Sub
○ Persistence
○ Replication
○ Transactions
○ Source Code
● Summary, Q&A
7
Introduction: HyperLogLog
An approximation algorithm.
Provides an approximation of the number of unique elements in a set
Features
Uses a very small amount of memory
Standard error of 0.81%
No limit to the number of items you can count, unless you approach 264 items
Trick : Uses randomization
References:
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.142.9475
https://www.linkedin.com/pulse/20141023230346-21057249-a-must-know-for-data-scientists-
hyperloglog-algorithm
http://blog.notdot.net/2012/09/Dam-Cool-Algorithms-Cardinality-Estimation
8
Introduction: Publish / Subscribe
Messaging System
Senders (in redis terminology called publishers) sends the messages while receivers
(subscribers) receive them
The link by which messages are transferred is called channel.
Couple of limitations
System Reliability (older versions)
Data Transmission Reliability
9
Internals: Persistence
10
In-memory data persisted to disk
Two different ways of persisting data to disk:
Snapshotting:
Take data as it exists at one moment and write it to disk
AOF (Append Only File)
Copy incoming write commands to disk as they happen
These methods can be used together, separately, or not at all in some circumstances.
Internals: Replication
11
Keep up-to-date copies of your data on additional machines
Other servers receive a continuously updated copy of the data as it’s being written
Two primary purposes of replication :
Availability
Replacement for Failed Master
Performance
Scalability of Reads
By combining replication and append-only files, we can configure Redis to be resilient against
system failures.
Internals: Transactions
12
Redis transactions allow the execution of a group of commands in a single step.
Two properties of Redis Transactions:
Sequential Execution
All commands in a transaction are sequentially executed as a single isolated operation. It is
not possible that a request issued by another client is served in the middle of the
execution of a Redis transaction
Atomic
Either all of the commands or none are processed.
Internals: Source Code
Github Repo
https://github.com/antirez/redis
Important Directories:
src: contains the Redis implementation, written in C.
tests: contains the unit tests, implemented in Tcl.
deps: contains libraries Redis uses.
Important Files:
server.h
server.c
networking.c
13
server.c
int main(int argc, char **argv) {
struct timeval tv;
int j;
...
initServer();
...
aeSetBeforeSleepProc(server.el,beforeSleep);
aeMain(server.el);
aeDeleteEventLoop(server.el);
return 0;
}
● Introduction
● Internals
● Summary, Q&A
14
Summary
15
A tool to solve problems.
Thinking about data-driven problems moves from “How can I bend my idea to fit into the world of
tables and rows?” to “Which structures in Redis will result in an easier-to-maintain solution?”
Has some unique structures and functionality that no other database offers
Redis is in-memory (making it fast), remote (making it accessible to multiple clients/servers),
persistent (giving you the opportunity to keep data between reboots), and scalable (via slaving
and sharding).
Scaling any system can be a challenge.
Q&A
16
References:
1. https://redislabs.com/ebook/redis-in-action
2. https://redis.io/documentation
3. https://pauladamsmith.com/articles/redis-under-the-hood.html
4. https://github.com/antirez/redis
5. https://www.tutorialspoint.com/redis/index.htm
abgoswam@gmail.com
https://www.linkedin.com/in/abgoswam

Redis. Seattle Data Science and Data Engineering Meetup

  • 1.
    Seattle Data ScienceAnd Data Engineering Meetup Abhishek Goswami. 11/29/2016 abgoswam@gmail.com https://www.linkedin.com/in/abgoswam
  • 2.
    Table Of Content Introduction Whatis Redis Redis Features Demos. Internals HyperLogLog Pub / Sub Persistence Replication Transactions Source Code Summary, Q&A 2
  • 3.
    ● Introduction ○ Whatis Redis? ○ Redis Features ○ Demos ● Internals ● Summary, Q&A 3
  • 4.
    Introduction: What isRedis? 4 Data Structures Server Redis provides access to mutable data structures via a set of commands Sent using a server-client model with TCP sockets and a simple protocol
  • 5.
    Introduction: Redis Features 5Data Types : Strings Hashes Lists Sets Sorted Sets 5 Additional Functionality : HyperLogLog Pub / Sub Persistence Replication Transactions Scripting Security Special Properties ● Data Structures are served and modified into the server memory. Redis also persists it on disk ○ This means that Redis is fast, but that is also non- volatile. ● Data structures stress on memory efficiency ○ So data structures inside Redis will likely use less memory compared to the same data structure modeled using an high level programming language ● Replication, tunable levels of durability, cluster, high availability
  • 6.
  • 7.
    ● Introduction ● Internals ○HyperLogLog ○ Pub / Sub ○ Persistence ○ Replication ○ Transactions ○ Source Code ● Summary, Q&A 7
  • 8.
    Introduction: HyperLogLog An approximationalgorithm. Provides an approximation of the number of unique elements in a set Features Uses a very small amount of memory Standard error of 0.81% No limit to the number of items you can count, unless you approach 264 items Trick : Uses randomization References: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.142.9475 https://www.linkedin.com/pulse/20141023230346-21057249-a-must-know-for-data-scientists- hyperloglog-algorithm http://blog.notdot.net/2012/09/Dam-Cool-Algorithms-Cardinality-Estimation 8
  • 9.
    Introduction: Publish /Subscribe Messaging System Senders (in redis terminology called publishers) sends the messages while receivers (subscribers) receive them The link by which messages are transferred is called channel. Couple of limitations System Reliability (older versions) Data Transmission Reliability 9
  • 10.
    Internals: Persistence 10 In-memory datapersisted to disk Two different ways of persisting data to disk: Snapshotting: Take data as it exists at one moment and write it to disk AOF (Append Only File) Copy incoming write commands to disk as they happen These methods can be used together, separately, or not at all in some circumstances.
  • 11.
    Internals: Replication 11 Keep up-to-datecopies of your data on additional machines Other servers receive a continuously updated copy of the data as it’s being written Two primary purposes of replication : Availability Replacement for Failed Master Performance Scalability of Reads By combining replication and append-only files, we can configure Redis to be resilient against system failures.
  • 12.
    Internals: Transactions 12 Redis transactionsallow the execution of a group of commands in a single step. Two properties of Redis Transactions: Sequential Execution All commands in a transaction are sequentially executed as a single isolated operation. It is not possible that a request issued by another client is served in the middle of the execution of a Redis transaction Atomic Either all of the commands or none are processed.
  • 13.
    Internals: Source Code GithubRepo https://github.com/antirez/redis Important Directories: src: contains the Redis implementation, written in C. tests: contains the unit tests, implemented in Tcl. deps: contains libraries Redis uses. Important Files: server.h server.c networking.c 13 server.c int main(int argc, char **argv) { struct timeval tv; int j; ... initServer(); ... aeSetBeforeSleepProc(server.el,beforeSleep); aeMain(server.el); aeDeleteEventLoop(server.el); return 0; }
  • 14.
  • 15.
    Summary 15 A tool tosolve problems. Thinking about data-driven problems moves from “How can I bend my idea to fit into the world of tables and rows?” to “Which structures in Redis will result in an easier-to-maintain solution?” Has some unique structures and functionality that no other database offers Redis is in-memory (making it fast), remote (making it accessible to multiple clients/servers), persistent (giving you the opportunity to keep data between reboots), and scalable (via slaving and sharding). Scaling any system can be a challenge.
  • 16.
    Q&A 16 References: 1. https://redislabs.com/ebook/redis-in-action 2. https://redis.io/documentation 3.https://pauladamsmith.com/articles/redis-under-the-hood.html 4. https://github.com/antirez/redis 5. https://www.tutorialspoint.com/redis/index.htm abgoswam@gmail.com https://www.linkedin.com/in/abgoswam