Everything you always wanted to know about Redis but were afraid to ask
The document provides an in-depth overview of Redis, a popular open-source in-memory data structure store, covering its architecture, key features, and use cases. It discusses various data structures supported by Redis, scripting capabilities, persistence options, replication, and performance optimization techniques. Additionally, it highlights the significance of Redis in enhancing scalability for major platforms like Twitter and Instagram.
Who’s behind Redis?
๏
Createdby Salvatore Sanfilippo
‣
‣
๏
http://antirez.com
@antirez at Twitter
Currently sponsored by Pivotal
‣
Previously to May 2013
sponsored by VMware
Who’s using Redis?II
๏
The architecture Twitter uses to deal with 150M active
users, 300K QPS, a 22 MB/S Firehose, and send tweets
in under 5 seconds. High Scalability (2013)▸
๏
Storing hundreds of millions of simple key-value pairs in
Redis. Instagram Engineering Blog (2012)▸
๏
The Instagram architecture Facebook bought for a cool
billion dollars. High Scalability (2012)▸
๏
Facebook’s Instagram: making the switch to Cassandra
from Redis, a 75% ‘insta’ savings. Planet Cassandra
(2013)▸
13.
Who’s using Redis?III
๏
Highly available real time push notifications and
you. Flickr Engineering Blog (2012)▸
๏
Using Redis as a secondary index for MySQL. Flickr
Engineering Blog (2013)▸
๏
How we made GitHub fast. The GitHub Blog (2009)▸
๏
Real world Redis. Agora Games (2012)▸
๏
Disqus discusses migration from Redis to Cassandra
for horizontal Scalability. Planet Cassandra (2013)▸
14.
Memory is thenew disk
๏
BSD licensed in-memory data structure server
‣
Strings, hashes, lists, sets…
๏
Optional durability
๏
Bindings to almost all relevant languages
“Memory is the new disk, disk is the new tape”
— Jim Gray
15.
A fight againstcomplexity
๏
Simple & robust foundations
‣
‣
๏
Single threaded
No map-reduce, no indexes, no vector clocks, no
Paxos, no Merkle trees, no gossip protocols…
Blazingly fast
‣
Implemented in C (20K LoC for the 2.2 release)
‣
No dependencies
16.
A fight againstcomplexity
!
…
5. We’re against complexity. We believe designing systems
is a fight against complexity. […] Most of the time the
best way to fight complexity is by not creating it at all.
…
The Redis Manifesto▸
17.
Most popular K-VDB
๏
Currently most popular key-value DB▸
๏
Redis 1.0 (April’09) ↝ Redis 2.8.6 (March’14)
Google Trends▸
18.
Latest releases I
๏
Redis2.6 (October’12)
‣
LUA scripting
‣
New commands
‣
Milliseconds precision expires
‣
Unlimited number of clients
‣
Improved AOF generation
19.
Latest releases II
๏
Redis2.8 (November’13)
‣
Redis 2.7 removing clustering stuff
‣
Partial resynchronization with slaves
‣
IPv6 support
‣
Config rewriting
‣
Key-space changes notifications via Pub/Sub
20.
Latest releases III
๏
Redis3.0
‣
Next beta release planned to March’14
‣
Redis Cluster
‣
Speed improvements under certain workloads
Performance
No pipelining
$ redis-benchmark
-r 1000000 -n 2000000
-t get,set,lpush,lpop -q
!
SET: 122556.53 requests per second
GET: 123601.76 requests per second
LPUSH: 136752.14 requests per second
LPOP: 132424.03 requests per second
24.
Performance
16 command perpipeline
$ redis-benchmark
-r 1000000 -n 2000000
-t get,set,lpush,lpop -P 16 -q
!
SET: 552028.75 requests per second
GET: 707463.75 requests per second
LPUSH: 767459.75 requests per second
LPOP: 770119.38 requests per second
Overview
๏
Family of fundamentaldata structures
‣
‣
Accessed / indexed by key
‣
๏
Strings and string containers
Directly exposed — No abstraction layers
Rich set of atomic operations over the structures
‣
๏
Detailed reference using big-O notation for complexities
Basic publish / subscribe infrastructure
28.
Keys
๏
Arbitrary ASCII strings
‣
‣
๏
Definesome format convention and adhere to it
Key length matters!
Multiple name spaces are available
‣
Separate DBs indexed by an integer value
-
๏
SELECT command
Multiples DBs vs. Single DB + key prefixes
Keys can expire automatically
Publish / Subscribe
Overview
๏
Classicpattern decoupling publishers & subscribers
‣
‣
๏
You can subscribe to channels; when someone publish in a
channel matching your interests Redis will send it to you
SUBSCRIBE, UNSUBSCRIBE & PUBLISH commands
Fire and forget notifications
‣
๏
Not suitable for reliable off-line notification of events
Pattern-matching subscriptions
‣
PSUBSCRIBE & PUNSUBSCRIBE commands
32.
Publish / Subscribe
Key-spacenotifications
๏
Available since Redis 2.8
‣
‣
๏
Disabled in the default configuration
Key-space vs. keys-event notifications
Delay of key expiration events
‣
Expired events are generated when Redis deletes the key;
not when the TTL is consumed
-
Lazy (i.e. on access time) key eviction
-
Background key eviction process
33.
Pipelining
๏
Redis pipelines arejust a RTT optimization
‣
Deliver multiple commands together without waiting for
replies
‣
Fetch all replies in a single step
-
Server needs to buffer all replies!
๏
Pipelines are NOT transactional or atomic
๏
Redis scripting FTW!
‣
Much more flexible alternative
34.
Transactions
๏
Or, more precisely,“transactions”
‣
Commands are executed as an atomic & single isolated operation
-
‣
๏
Rollback is not supported!
MULTI, EXEC & DISCARD commands
‣
๏
Partial execution is possible due to pre/post EXEC failures!
Conditional EXEC with WATCH
Redis scripting FTW!
‣
Redis transactions are complex and cumbersome
35.
Scripting
Overview I
๏
Added inRedis 2.6
๏
Uses the LUA 5.1 programming language▸
‣
Base, Table, String, Math & Debug libraries
‣
Built-in support for JSON and MessagePack
‣
No global variables
‣
redis.{call(), pcall()}
‣
redis.{error_reply(), status_reply(), log()}
36.
Scripting
Overview II
๏
Scripts areatomic, like any other command
๏
Scripts add minimal overhead
‣
๏
Shared LUA context
Scripts are replicated on slaves by sending the script
(i.e. not the resulting commands)
‣
‣
Single thread
Scripts are required to be pure functions
Maximum execution time vs. Atomic execution
37.
Scripting
What is fixedwith scripting?
๏
Server side manipulation of data
๏
Minimizes latency
‣
๏
No round trip delay
Maximizes CPU usage
‣
‣
๏
Less parsing
Less OS system calls
Simpler & faster alternative to WATCH
38.
Scripting
Scripts vs. Storedprocedures
๏
Stored procedures are evil
๏
Backend logic should be 100% application side
‣
‣
๏
No hidden behaviors
No crazy version management
Redis keys are explicitly declared as parameters of the script
‣
Cluster friendly
‣
Hashed scripts
Scripting
DECREMENT-IF-GREATER-THAN
EVAL "
local res= redis.call('GET', KEYS[1]);
!
if res ~= nil then
res = tonumber(res);
if res ~= nil and res > tonumber(ARGV[1]) then
res = redis.call('DECR', KEYS[1]);
end
end
!
return res" 1 foo 100
41.
Scripting
Some more commands
๏
EVALSHAsha1 nkeys key [key…] arg [arg…]
‣
Client libraries optimistically use EVALSHA
-
‣
๏
On NOSCRIPT error, EVAL is used
Automatic version management
SCRIPT LOAD script
‣
Cached scripts are no flushed until server restart
‣
Ensures EVALSHA will not fail (e.g. MULTI/EXEC)
Some examples I
๏
11common web use cases solved in Redis▸
๏
How to take advantage of Redis just adding it to
your stack▸
๏
A case study: design and implementation of a
simple Twitter clone using only PHP and Redis▸
๏
Scaling Crashlytics: building analytics on Redis
2.6▸
44.
Some examples II
๏
Fast,easy, realtime metrics using Redis
bitmaps▸
๏
Redis - NoSQL data store▸
๏
Auto complete with Redis▸
๏
Multi user high performance web chat▸
Persistence
Overview
๏
The whole datasetneeds to feet in memory
‣
‣
Very high read & write rates
‣
๏
Durability is optional
Optimal & simple memory and disk representations
What if Redis runs out of memory?
‣
Swapping
Performance degradation
‣
Hit maxmemory limit
Failed writes or eviction policy
47.
Persistence
Snapshotting — RDB
๏
Periodicasynchronous point-in-time dump to disk
‣
Every S seconds and C changes
‣
Fast service restarts
๏
Possible data lost during a crash
๏
Compact files
๏
Minimal overhead during operation
๏
Huge data sets may experience short delays during fork()
๏
Copy-on-write fork() semantics
2x memory problem
48.
Persistence
Append only file— AOF
๏
Journal file logging every write operation
‣
‣
๏
Configurable fsync frequency: speed vs. safety
Commands replayed when server restarts
No as compact as RDB
‣
Safe background AOF file rewrite fork()
๏
Overhead during operation depends on fsync behavior
๏
Recommended to use both RDB + AOF
‣
RDB is the way to of for backups & disaster recovery
49.
Security
๏
Designed for trustedclients in trusted environments
‣
๏
Basic unencrypted AUTH command
‣
๏
No users, no access control, no connection filtering…
requirepass s3cr3t
Command renaming
‣
rename-command FLUSHALL f1u5hc0mm4nd
‣
rename-command FLUSHALL ""
50.
Replication
Overview I
๏
One master— Multiple slaves
‣
Scalability & redundancy
-
‣
Client side failover, eviction, query routing…
Lightweight master
๏
Slaves are able to accept other slave connections
๏
Non-blocking in the master, but blocking on the slaves
๏
Asynchronous but periodically acknowledged
51.
Replication
Overview II
๏
Automatic slavereconnection
๏
Partial resynchronization: PSYNC vs. SYNC
‣
๏
RDB snapshots are used during initial SYNC
Read-write slaves
‣
‣
๏
slave-read-only no
Ephemeral data storage
Minimum replication factor
52.
Replication
Some commands &configuration
๏
Trivial setup
‣
‣
๏
slaveof <host> <port>
SLAVEOF [<host> <port >| NO ONE]
Some more configuration tips
‣
slave-serve-stale-data [yes|no]
‣
repl-ping-slave-period <seconds>
‣
masterauth <password>
Performance
General tips
๏
Fast CPUswith large caches and not many cores
๏
Do not invest on expensive fast memory modules
๏
Avoid virtual machines
๏
Use UNIX domain sockets when possible
๏
Aggregate commands when possible
๏
Keep low the number of client connections
55.
Performance
Advanced optimization
๏
Special encodingof small aggregate data types
๏
32 vs. 64 bit instances
๏
Consider using bit & byte level operations
๏
Use hashes when possible
๏
Alway check big-O notation complexities
56.
Performance
Understanding metrics I
๏
redis-cli--latency
‣
Typical latency for 1 GBits/s network is 200 μs
‣
SHOWLOG GET
‣
Monitor number of client connections and
consider using multiplexing proxy
‣
Improve memory management
57.
Performance
Understanding metrics II
๏
redis-cliINFO | grep …
๏
used_memory
‣
Usually inferior to used_memory_rss
-
Used memory as seen by the OS
‣
Swapping risk when approaching 45% / 95%
‣
Reduce Redis footprint when possible
Redis pools
๏
Redis isextremely small footprint and lightweight
๏
Multiple Redis instances per node
‣
‣
Mitigated RDB 2x memory problem
‣
๏
Full CPU usage
Fine tuned instances
How to use multiple instances?
‣
Sharding
‣
Specialized instances
61.
Redis Sentinel
Overview I
๏
OfficialRedis HA / failover solution
‣
‣
On master failure, choose slave & promote to master
‣
๏
Periodically check liveness of Redis instances
Notify clients & slaves about the new master
Multiple Sentinels
‣
Complex distributed system
‣
Gossip, quorum & leader election algorithms
62.
Redis Sentinel
Overview II
๏
Workin progress not ready for production
๏
Master pub/sub capabilities
‣
Auto discovery of other sentinels & slaves
‣
Notification of master failover
๏
Explicit client support required
๏
Redis Sentinel is a monitoring system with support for
automatic failover. It does not turn Redis into a
distributed data store. CAP discussions do not apply▸
63.
Redis Sentinel
Apache Zookeeper
๏
Setof primitives to ease building distributed systems
‣
‣
Handling of network partitions, leader election,
quorum management…
‣
๏
http://zookeeper.apache.org
Replicated, highly available, well-known…
Ad-hoc Redis HA alternative to Sentinel
‣
Explicit client implementation required
64.
Redis Cluster
๏
Long termproject to be released in Redis 3.0
๏
High performance & linearly scalable complex distributed DB
‣
‣
๏
Sharding across multiple nodes
Graceful handling of network partitions
Implemented subset
‣
‣
๏
Commands dealing with multiple keys, etc. not supported
Multiple databases are not supported
Keys hash tags
65.
Sharding
Overview I
๏
Distribute datainto multiple Redis instances
‣
‣
๏
Allows much larger databases
Allows to scale the computational power
Data distribution strategies
‣
Directory based
‣
Ranges
‣
Hash + Module
‣
Consistent hashing
66.
Sharding
Overview II
๏
Data distributionresponsibility
‣
‣
Proxy assisted
‣
๏
Client side
Query routing
Do I really need sharding?
‣
Very unlikely CPU becomes bottleneck with Redis
‣
500K requests per second!
67.
Sharding
Disadvantages
๏
Multi-key commands arenot supported
๏
Multi-key transactions are not supported
๏
Sharding unit is the key
๏
Harder client logics
๏
Complex to scale up/down when used as a store
68.
Sharding
Presharding
๏
Hard to scaleup/down sharded databases
‣
๏
Take advantage of small Redis footprint
‣
๏
But data storage needs may vary over the time
Think big!
Redis replication allows moving instances with
minimal downtime
69.
Sharding
Twimproxy overview I
๏
RedisCluster is currently not production ready
‣
Mix between query routing & client side partitioning
๏
Not all Redis clients support sharding
๏
Automatic sharding Redis & Memcache (ASCII) proxy
‣
Developed by Twitter & Apache 2.0 licensed
‣
https://github.com/twitter/twemproxy/
‣
Single threaded & extremely fast
70.
Sharding
Twimproxy overview II
๏
Alsoknown as nutcracker
๏
Connection multiplexer pipelining requests and
responses
‣
Original motivation
๏
No bottleneck or single point of failure
๏
Optional node ejection
‣
Only useful when using Redis as a cache
71.
Sharding
Why Twimproxy ?
๏
Multiplexedpersistent server connections
๏
Automatic sharding and protocol pipelining
๏
Multiple distribution algorithms supporting
nicknames
๏
Simple dumb clients
๏
Automatic fault tolerance capabilities
๏
Zero copy
72.
Sharding
Why not Twimproxy?
๏
Extra network hop
‣
๏
Pipelining is your friend
Not all commands supported
‣
‣
๏
Transactions
Pub / Sub
HA not supported
‣
Redis Sentinel Twemproxy agent