SlideShare a Scribd company logo
1 of 79
Download to read offline
Caching:
Redis & Memcached
Cheng Zhang, Koosha Khajehmoogahi
Agenda
2
• What is caching?
• What is Redis?
• What is Memcached?
• Redis vs. Memcached
Caching is omnipresent
3
A sample scenario: Website
4
A sample scenario: Website
4
A sample scenario: Website
4
Redis vs Memcached
In-memory Key-Value Cache
5
What is Redis
Redis is an open source, BSD licensed, advanced key-value
cache and store. It is often referred to as a data structure
server since keys can contain strings, hashes, lists, sets,
sorted sets, bitmaps and hyperloglogs.[1]
6[1] http://redis.io
History
Redis 3.0 Redis cluster supported
Redis 2.8 Asynchronous replication used
Redis 2.6 Lua interpreter built to evaluate scripts
…
Redis 1.0 April.09
7
Platforms & Clients
C, C#, C++, Go, Java, Node.js,
Objective-C, Perl, PHP, Python,
Ruby, Scala …
Technical Architecture
• Persistence
• Virtual Memory
• Key Value
• Master/Slave
• Cluster
• Pub/Sub
8
Persistence
• Datasets can be saved to disk.
1. RDB(Redis DB) snaphots (default)
RDB is a very compact single-file point-in-time
representation of the Redis dataset.

2. AOF(Append Only File) logs
AOF is a simple text log of write operations.
9
Virtual Memory
The goal of the VM subsystem is to free memory transferring Redis
Objects from memory to disk.
10
Keys swap file
Values
Values
Values
Redis
Storage
Virtual Memory
The goal of the VM subsystem is to free memory transferring Redis
Objects from memory to disk.
10
Keys
Memory Storage
swap file
Values
Values
Values
Virtual Memory
The goal of the VM subsystem is to free memory transferring Redis
Objects from memory to disk.
10
Keys
Memory Storage
swap file
Values
Values
Values
Virtual Memory
The goal of the VM subsystem is to free memory transferring Redis
Objects from memory to disk.
10
Keys
Memory Storage
swap file
Values
Values
Values
swap file
Virtual Memory
The goal of the VM subsystem is to free memory transferring Redis
Objects from memory to disk.
10
Keys
Memory Storage
swap file
Values
Values
swap file
Virtual Memory
The goal of the VM subsystem is to free memory transferring Redis
Objects from memory to disk.
10
Keys
Memory Storage
swap file
Values
Values
Values
Redis keys are binary safe, this means that you can use any binary sequence
as a key, from a string like "foo" to the content of a JPEG file. The empty
string is also a valid key.
For example:
object-type:id
Key Value
11
• Binary-safe strings.
• Lists: collections of string elements sorted according to the order of
insertion. They are basically linked lists.
• Sets: collections of unique, unsorted string elements.
• Sorted sets, similar to Sets but where every string element is associated to
a floating number value, called score.
• Hashes, which are maps composed of fields associated with values. Both
the field and the value are strings.
• Bit arrays (or simply bitmaps): it is possible, using special commands, to
handle String values like an array of bits: you can set and clear individual
bits, count all the bits set to 1, find the first set or unset bit, and so forth.
• HyperLogLogs: this is a probabilistic data structure which is used in order to
estimate the cardinality of a set.
Key Value
12
Cluster
13
Master
Cluster
13
Master
Cluster
13
Master
Cluster
13
Master B
5501-11000
Master A
0-5500
Master C
11001-16384
Cluster Bus
Every Redis Cluster node requires two TCP connections open.
• The normal Redis TCP port used to serve clients.
• The high port is used for the Cluster bus, that is a node-to-
node communication channel using a binary protocol.
Cluster
13
Master B
5501-11000
Master A
0-5500
Master C
11001-16384
Cluster Bus
Every Redis Cluster node requires two TCP connections open.
• The normal Redis TCP port used to serve clients.
• The high port is used for the Cluster bus, that is a node-to-
node communication channel using a binary protocol.
A given key "foo" is at slot:
slot = crc16("foo") mod NUMER_SLOTS
Cluster
• All nodes are directly
connected with a
service channel. TCP
baseport+4000,
example 6379 ->
10379.
• Node to Node protocol is
binary, optimized for
bandwidth and speed.
• Clients talk to nodes as
usually, using ascii
protocol, with minor
additions.
• Nodes don't proxy queries.
14
Client Client Client
Cluster
14
Client Client Client
• Dummy Client
1. Client => A: GET foo
2. A => Client: -MOVED 8
192.168.5.21:6391
3. Client => B: GET foo
4. B => Client: "bar"
• Smart Client
1. Client => A: CLUSTER
HINTS
2. A => Client: ... a map of
hash slots -> nodes
3. Client => B: GET foo
4. B => Client: "bar"
Master/Slave
15
Master
Master
X
X
Master/Slave
15
Master
Master/Slave
15
Master 1st Slave
1st Slave
slaveof masterip port
Master/Slave
15
Master 1st Slave 2nd Slave
1st Slave 2nd Slave
slaveof masterip port
Model Flow
16
Client Master
Slave
1. Request
Async Model Flow
Model Flow
16
Client Master
Slave
1. Request
2. Response
Async Model Flow
Model Flow
16
Client Master
Slave
1. Request
2. Response
3. Binlog Copy
4. Apply Replicate
Async Model Flow
Model Flow
17
Sync Model Flow
Client Master
Slave
1. Request
4. Response
2. Binlog Copy
3. Apply Replicate
Model Flow
18
Sem-Sync Model Flow
Client Master
Slave
1. Request
3. Response
2. Binlog Copy
4. Apply Replicate
Cluster & Master/Slave
19
Master A
1, 2, 4
Master B
3, 7, 8
Master C
5, 6, 9, 10
Cluster & Master/Slave
19
Master A
1, 2, 4
Master B
3, 7, 8
Master C
5, 6, 9, 10
Slave A1
1, 2, 4
Slave A2
1, 2, 4
Slave B1
3, 7, 8
Slave B2
3, 7, 8
Slave C1
5, 6, 9, 10
Slave C2
5, 6, 9, 10
Cluster & Master/Slave
19
Master A
1, 2, 4
Master B
3, 7, 8
Master C
5, 6, 9, 10
Slave A1
1, 2, 4
Slave A2
1, 2, 4
Slave B1
3, 7, 8
Slave B2
3, 7, 8
Slave C1
5, 6, 9, 10
Slave C2
5, 6, 9, 10
Overloaded
Cluster & Master/Slave
19
Master A
1, 2, 4
Master B
3, 7, 8
Master C
5, 6, 9, 10
Master D
Slave A1
1, 2, 4
Slave A2
1, 2, 4
Slave B1
3, 7, 8
Slave B2
3, 7, 8
Slave C1
5, 6, 9, 10
Slave C2
5, 6, 9, 10
10
Master D
10
Cluster & Master/Slave
19
Master A
1, 2, 4
Master B
3, 7, 8
Master C
5, 6, 9, 10
Master D
Slave A1
1, 2, 4
Slave A2
1, 2, 4
Slave B1
3, 7, 8
Slave B2
3, 7, 8
Slave C1
5, 6, 9, 10
Slave C2
5, 6, 9, 10
10
Master D
10
Cluster & Master/Slave
19
Master A
1, 2, 4
Master B
3, 7, 8
Master C
5, 6, 9, 10
Master D
Slave A1
1, 2, 4
Slave A2
1, 2, 4
Slave B1
3, 7, 8
Slave B2
3, 7, 8
Slave C1
5, 6, 9, 10
Slave C2
5, 6, 9, 10
Master C
5, 6, 9
Slave C2
5, 6, 9
Slave C1
5, 6, 9
10
Master D
10
Cluster & Master/Slave
20
Master A
1, 2, 4
Master B
3, 7, 8
Master C
5, 6, 9, 10
Master D
Slave A1
1, 2, 4
Slave A2
1, 2, 4
Slave B1
3, 7, 8
Slave B2
3, 7, 8
Slave C1
5, 6, 9, 10
Slave C2
5, 6, 9, 10
Master C
5, 6, 9
Slave C2
5, 6, 9
Slave C1
5, 6, 9
Master D
10
Cluster & Master/Slave
20
Master A
1, 2, 4
Master B
3, 7, 8
Master C
5, 6, 9, 10
Master D
Slave A1
1, 2, 4
Slave A2
1, 2, 4
Slave B1
3, 7, 8
Slave B2
3, 7, 8
Slave C1
5, 6, 9, 10
Slave C2
5, 6, 9, 10
Master C
5, 6, 9
Slave C2
5, 6, 9
Slave C1
5, 6, 9
Master D
10
Failed
Cluster & Master/Slave
20
Master A
1, 2, 4
Master C
5, 6, 9, 10
Master D
Slave A1
1, 2, 4
Slave A2
1, 2, 4
Slave B2
3, 7, 8
Slave C1
5, 6, 9, 10
Slave C2
5, 6, 9, 10
Master C
5, 6, 9
Slave C2
5, 6, 9
Slave C1
5, 6, 9
Master B
3, 7, 8
Master D
10
Cluster & Master/Slave
20
Master A
1, 2, 4
Master C
5, 6, 9, 10
Master D
Slave A1
1, 2, 4
Slave A2
1, 2, 4
Slave B2
3, 7, 8
Slave C1
5, 6, 9, 10
Slave C2
5, 6, 9, 10
Master C
5, 6, 9
Slave C2
5, 6, 9
Slave C1
5, 6, 9
Master B
3, 7, 8
Master D
10
Example: LINE with Redis
21
Example: LINE with Redis
21
Pub/Sub
22
Redis
Channel Channel Channel Channel Channel
Pub/Sub
22
Redis
Channel
Subscriber
Channel
Channel Channel Channel Channel
Pub/Sub
22
Redis
Channel
Subscriber
Channel
Subscriber
Channel
Subscriber
Channel
Channel Channel Channel Channel
Pub/Sub
22
Redis
Publisher
Channel
Subscriber
Channel
Subscriber
Channel
Subscriber
Channel
Channel Channel Channel Channel
Pub/Sub
22
Redis
Publisher
Channel
Subscriber
Channel
Subscriber
Channel
Subscriber
Channel
Channel Channel Channel Channel
Message: HelloMessage: HelloMessage: Hello
Pub/Sub
22
Redis
Publisher
Channel
Subscriber
Channel
Subscriber
Channel
Subscriber
Channel
Channel Channel Channel Channel
Message: Hello Message: HelloMessage: Hello
Pub/Sub
22
Redis
Publisher
Channel
Subscriber
Channel
Subscriber
Channel
Subscriber
Channel
Channel Channel Channel Channel
Message: Hello Message: HelloMessage: Hello
redis> PSUBSCRIBE news.*
Pub/Sub
23
Persistence Pub/Sub
HTTP
Server
MQTT
Server
CoAP
Server
HTTP Clients MQTT Clients CoAP Clients
Example with Redis in Publish / Subscribe
24
Example with Redis in Publish / Subscribe
24
Conclusion
Redis is a different evolution path in the key-value DBs where
values can contain more complex data types, with atomic
operations defined on those data types.

Redis is an in-memory but persistent on disk database, so it
represents a different trade off where very high write and read
speed is achieved with the limitation of data sets that can't be
larger than memory.
25
Memcached
• What is Memcached?
• Where is it used?
• How is it used?
• How does it work internally?
• A Cluster of Memcached Servers
• Memcached and CAP Theorem
• Memcached vs Redis
26
What is Memcached?
Memcached is a free and open-source general-purpose
distributed in-memory object caching system intended to be
used for web applications.
Its primary goal is to cache the data returned from database
interactions and/or cache the result of costy operations and
API calls.
27
History & Characteristics
• Originally written in Perl in 2003 for LiveJournal.com,
then rewritten in C
• First release in May 2003
• Released under Revised BSD License
• Platform-independent (Windows, Linux, BSD, UNIX, Mac)
• Client APIs in many different languages:
• C, C++, Java, PHP, Perl, Python, Ruby, Windows .NET, Lua, …
• Used by many giant web sites: Facebook, YouTube, Reddit, Twitter,
Wikipedia, Flickr, etc.
28
Where is it used? (Architectural view)
Memcached resides between Front-end
Web tier and the Back-end Database tier.
Front-end web tier tries to fetch
data from Memcached at first
and then interacts with
database in case of
cache miss.
1
1
2
2
29
How is it used? (Conceptual view)
From a programmer’s point of view, Memcached is a
collection of key-value items stored in memory.
Keys are unique identifiers with a possible length of up to 250
characters.
Values can have any format including raw binary data and
they can contain data up to 1 MB (by default).
30
How is it used? (Conceptual view)
Example (in Perl):
use Cache::Memcached;
$cache = new Cache::Memcached {
servers => ["192.168.0.10:11211",
"192.168.0.20:11211"] };
$user_info = $cache->get("user:$id");
if (!$user_info) {
$user_info = new User($id);
$cache->set("user:$id", $user_info);
}
print $user_info->name;
31
Some Common Operations
set(key, val)
add(key, val): like set but only works if the key does not
already exist
replace(key, val): like set but only works if the key already
exists
get(key)
delete(key)
incr(key)
decr(key)
32
How does it work internally? (Physical view)
• Characteristics of data items
• Internal data storage and memory management
• Cluster topology
33
Characteristics of data items
• Each data item consists of a key, value, optional flags and
an optional expiration time (TTL)
• Data is removed from the cache once its TTL has expired
or the cache is already full and it is selected by the
replacement algorithm
• Memcached uses Least Recently Used (LRU) policy to free
the cache
34
Internal data storage and memory management
Memcached has 4 primary internal data structures:
A hash table to store and locate cache items
A Least Recently Used (LRU) list to specify cache item
removal (eviction) when the cache is full
A cache item data structure for storing key, flags, data and
pointers
A slab allocator which handles memory management for
cache items
35
Hash Table
• The hash table data structure is an array whose elements are buckets
• Each bucket is a single linked-list of cache items
• Each cache item maintains its pointer in its respective bucket
36
LRU Data Structure
• LRU is used to determine which cache item to evict
• LRU is a double linked-list for holding all cache items in a particular slab
• Each slab has its own LRU data structure
• The LRU pointers are located in each cache item data structure
• Each access/modification of a cache item updates its pointers in its LRU
• Last item (tail) of each list is the item to be removed in the event of eviction
37
Slab Allocator
Problems with malloc() and free() functions for memory management:
• Poor performance
• Memory fragmentation after some time
To overcome these problems, Memcached uses a Slab Allocator:
• The allowed amount of memory is allocated all at once at startup time
• The memory is divided into different classes called slab classes (pages) that are equal in
size
• Each slab class is again divided into smaller pieces for data storage called slabs
(chunks)
• Slab allocator takes care of pages and slabs
• A cache item will be located in the smallest slab that is bigger than or equal to the item
38
Slab Allocator
Example:
2048 bytes of allowed memory is split into 4 slab classes (512 bytes each) and
each slab class is divided into chunks
slab class 1
slab class 2
slab class 3
slab class 4
64 B 64 B 64 B 64 B 64 B 64 B 64 B 64 B
128 B 128 B 128 B 128 B
256 B 256 B
512 B
512 bytes
39
Slab Allocator
To see the slab classes, run Memcached with -vv option:
$ memcached –m 64M –vv
Output:
slab class 1: chunk size 96 perslab 10922
slab class 2: chunk size 120 perslab 8738
slab class 3: chunk size 152 perslab 6898
slab class 4: chunk size 192 perslab 5461
.
.
.
slab class 42: chunk size 1048576 perslab 1
40
Handling Client Requests
• Client requests are handled with libevent
and assigned to threads in Memcached
• Each thread waits to acquire the lock
to enter the critical section to do the hash
table processing and updating LRU data structure
• Then, the thread leaves the critical section
and sends back the result to the client.
à Every internal operation in Memcached
has the cost of O(1).
à The critical section is a performance
bottleneck since the execution is serial.
41
A Cluster of Memcached Servers

• It is possible to have a number of different
Memcached instances on different machines.
• The instances are unaware of one another
and do not communicate with others.
• The client API determines which server to
use by applying a hash function on item’s key
• Servers are grouped together with consistent
hashing.
• A server can have a weight to affect its
selection (default: 1)
• In the event of failure, the client can use
other servers
Client
hash(key)
42
Memcached and CAP Theorem

✓ Consistency: All clients see the same data at the same time.
× Availability: Some clients might not be able to access Memcached at
some time.
✓ Partition tolerance: Memcached continues to work in case of arbitrary
message loss or failure of other servers.
43
Redis vs Memcached
Redis Memcached
Persistent storage Transient caching
In memory, able to persist to disk Only in memory caching
Appropriate for complex data structures
(hashes, lists, sets, etc)
Appropriate for simple key-value pairs
Master/Slave replication No support for replication
High performance in large volumes of read
and write operations
High performance in only large volumes of
read operations
Key length limit: 2 GB! Key length limit: 250 bytes
Open source (BSD license) Open source (Revised BSD license)
44
“Memory is the new disk, disk is the new tape.”
—Jim Gray (1944 – 2007)
45
Questions?
Thank you for your attention.
46

More Related Content

Similar to Seminar_Final

Atom The Redis Streams-Powered Microservices SDK: Dan Pipemazo
Atom The Redis Streams-Powered Microservices SDK: Dan PipemazoAtom The Redis Streams-Powered Microservices SDK: Dan Pipemazo
Atom The Redis Streams-Powered Microservices SDK: Dan PipemazoRedis Labs
 
ceph optimization on ssd ilsoo byun-short
ceph optimization on ssd ilsoo byun-shortceph optimization on ssd ilsoo byun-short
ceph optimization on ssd ilsoo byun-shortNAVER D2
 
Introduction to column oriented databases in PHP
Introduction to column oriented databases in PHPIntroduction to column oriented databases in PHP
Introduction to column oriented databases in PHPZend by Rogue Wave Software
 
Chapter 8. Partial updates and retrievals.pdf
Chapter 8. Partial updates and retrievals.pdfChapter 8. Partial updates and retrievals.pdf
Chapter 8. Partial updates and retrievals.pdfRick Hwang
 
VLSI lab manual Part A, VTU 7the sem KIT-tiptur
VLSI lab manual Part A, VTU 7the sem KIT-tipturVLSI lab manual Part A, VTU 7the sem KIT-tiptur
VLSI lab manual Part A, VTU 7the sem KIT-tipturPramod Kumar S
 
BlueHat v18 || Hardening hyper-v through offensive security research
BlueHat v18 || Hardening hyper-v through offensive security researchBlueHat v18 || Hardening hyper-v through offensive security research
BlueHat v18 || Hardening hyper-v through offensive security researchBlueHat Security Conference
 
Scaling Redis To 1M Ops/Sec: Jane Paek
Scaling Redis To 1M Ops/Sec: Jane PaekScaling Redis To 1M Ops/Sec: Jane Paek
Scaling Redis To 1M Ops/Sec: Jane PaekRedis Labs
 
Ceph Tech Talk -- Ceph Benchmarking Tool
Ceph Tech Talk -- Ceph Benchmarking ToolCeph Tech Talk -- Ceph Benchmarking Tool
Ceph Tech Talk -- Ceph Benchmarking ToolCeph Community
 
An Intro to NoSQL Databases
An Intro to NoSQL DatabasesAn Intro to NoSQL Databases
An Intro to NoSQL DatabasesRajith Pemabandu
 
Scaling ingest pipelines with high performance computing principles - Rajiv K...
Scaling ingest pipelines with high performance computing principles - Rajiv K...Scaling ingest pipelines with high performance computing principles - Rajiv K...
Scaling ingest pipelines with high performance computing principles - Rajiv K...SignalFx
 
Accelerate Reed-Solomon coding for Fault-Tolerance in RAID-like system
Accelerate Reed-Solomon coding for Fault-Tolerance in RAID-like systemAccelerate Reed-Solomon coding for Fault-Tolerance in RAID-like system
Accelerate Reed-Solomon coding for Fault-Tolerance in RAID-like systemShuai Yuan
 
Apache Kafka from 0.7 to 1.0, History and Lesson Learned
Apache Kafka from 0.7 to 1.0, History and Lesson LearnedApache Kafka from 0.7 to 1.0, History and Lesson Learned
Apache Kafka from 0.7 to 1.0, History and Lesson LearnedGuozhang Wang
 
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...Odinot Stanislas
 
ARM CoAP Tutorial
ARM CoAP TutorialARM CoAP Tutorial
ARM CoAP Tutorialzdshelby
 

Similar to Seminar_Final (20)

Client server
Client serverClient server
Client server
 
Atom The Redis Streams-Powered Microservices SDK: Dan Pipemazo
Atom The Redis Streams-Powered Microservices SDK: Dan PipemazoAtom The Redis Streams-Powered Microservices SDK: Dan Pipemazo
Atom The Redis Streams-Powered Microservices SDK: Dan Pipemazo
 
ceph optimization on ssd ilsoo byun-short
ceph optimization on ssd ilsoo byun-shortceph optimization on ssd ilsoo byun-short
ceph optimization on ssd ilsoo byun-short
 
Introduction to column oriented databases in PHP
Introduction to column oriented databases in PHPIntroduction to column oriented databases in PHP
Introduction to column oriented databases in PHP
 
Chapter 8. Partial updates and retrievals.pdf
Chapter 8. Partial updates and retrievals.pdfChapter 8. Partial updates and retrievals.pdf
Chapter 8. Partial updates and retrievals.pdf
 
VLSI lab manual Part A, VTU 7the sem KIT-tiptur
VLSI lab manual Part A, VTU 7the sem KIT-tipturVLSI lab manual Part A, VTU 7the sem KIT-tiptur
VLSI lab manual Part A, VTU 7the sem KIT-tiptur
 
BlueHat v18 || Hardening hyper-v through offensive security research
BlueHat v18 || Hardening hyper-v through offensive security researchBlueHat v18 || Hardening hyper-v through offensive security research
BlueHat v18 || Hardening hyper-v through offensive security research
 
Scaling Redis To 1M Ops/Sec: Jane Paek
Scaling Redis To 1M Ops/Sec: Jane PaekScaling Redis To 1M Ops/Sec: Jane Paek
Scaling Redis To 1M Ops/Sec: Jane Paek
 
Ceph Tech Talk -- Ceph Benchmarking Tool
Ceph Tech Talk -- Ceph Benchmarking ToolCeph Tech Talk -- Ceph Benchmarking Tool
Ceph Tech Talk -- Ceph Benchmarking Tool
 
An Intro to NoSQL Databases
An Intro to NoSQL DatabasesAn Intro to NoSQL Databases
An Intro to NoSQL Databases
 
Scaling ingest pipelines with high performance computing principles - Rajiv K...
Scaling ingest pipelines with high performance computing principles - Rajiv K...Scaling ingest pipelines with high performance computing principles - Rajiv K...
Scaling ingest pipelines with high performance computing principles - Rajiv K...
 
Accelerate Reed-Solomon coding for Fault-Tolerance in RAID-like system
Accelerate Reed-Solomon coding for Fault-Tolerance in RAID-like systemAccelerate Reed-Solomon coding for Fault-Tolerance in RAID-like system
Accelerate Reed-Solomon coding for Fault-Tolerance in RAID-like system
 
Apache Kafka from 0.7 to 1.0, History and Lesson Learned
Apache Kafka from 0.7 to 1.0, History and Lesson LearnedApache Kafka from 0.7 to 1.0, History and Lesson Learned
Apache Kafka from 0.7 to 1.0, History and Lesson Learned
 
Fabric8 mq
Fabric8 mqFabric8 mq
Fabric8 mq
 
Ceph
CephCeph
Ceph
 
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
 
8871077.ppt
8871077.ppt8871077.ppt
8871077.ppt
 
ARM CoAP Tutorial
ARM CoAP TutorialARM CoAP Tutorial
ARM CoAP Tutorial
 
TechTalk: Connext DDS 5.2.
TechTalk: Connext DDS 5.2.TechTalk: Connext DDS 5.2.
TechTalk: Connext DDS 5.2.
 
Elliptics
EllipticsElliptics
Elliptics
 

Seminar_Final

  • 1. Caching: Redis & Memcached Cheng Zhang, Koosha Khajehmoogahi
  • 2. Agenda 2 • What is caching? • What is Redis? • What is Memcached? • Redis vs. Memcached
  • 4. A sample scenario: Website 4
  • 5. A sample scenario: Website 4
  • 6. A sample scenario: Website 4
  • 7. Redis vs Memcached In-memory Key-Value Cache 5
  • 8. What is Redis Redis is an open source, BSD licensed, advanced key-value cache and store. It is often referred to as a data structure server since keys can contain strings, hashes, lists, sets, sorted sets, bitmaps and hyperloglogs.[1] 6[1] http://redis.io
  • 9. History Redis 3.0 Redis cluster supported Redis 2.8 Asynchronous replication used Redis 2.6 Lua interpreter built to evaluate scripts … Redis 1.0 April.09 7 Platforms & Clients C, C#, C++, Go, Java, Node.js, Objective-C, Perl, PHP, Python, Ruby, Scala …
  • 10. Technical Architecture • Persistence • Virtual Memory • Key Value • Master/Slave • Cluster • Pub/Sub 8
  • 11. Persistence • Datasets can be saved to disk. 1. RDB(Redis DB) snaphots (default) RDB is a very compact single-file point-in-time representation of the Redis dataset. 2. AOF(Append Only File) logs AOF is a simple text log of write operations. 9
  • 12. Virtual Memory The goal of the VM subsystem is to free memory transferring Redis Objects from memory to disk. 10 Keys swap file Values Values Values Redis Storage
  • 13. Virtual Memory The goal of the VM subsystem is to free memory transferring Redis Objects from memory to disk. 10 Keys Memory Storage swap file Values Values Values
  • 14. Virtual Memory The goal of the VM subsystem is to free memory transferring Redis Objects from memory to disk. 10 Keys Memory Storage swap file Values Values Values
  • 15. Virtual Memory The goal of the VM subsystem is to free memory transferring Redis Objects from memory to disk. 10 Keys Memory Storage swap file Values Values Values
  • 16. swap file Virtual Memory The goal of the VM subsystem is to free memory transferring Redis Objects from memory to disk. 10 Keys Memory Storage swap file Values Values
  • 17. swap file Virtual Memory The goal of the VM subsystem is to free memory transferring Redis Objects from memory to disk. 10 Keys Memory Storage swap file Values Values Values
  • 18. Redis keys are binary safe, this means that you can use any binary sequence as a key, from a string like "foo" to the content of a JPEG file. The empty string is also a valid key. For example: object-type:id Key Value 11
  • 19. • Binary-safe strings. • Lists: collections of string elements sorted according to the order of insertion. They are basically linked lists. • Sets: collections of unique, unsorted string elements. • Sorted sets, similar to Sets but where every string element is associated to a floating number value, called score. • Hashes, which are maps composed of fields associated with values. Both the field and the value are strings. • Bit arrays (or simply bitmaps): it is possible, using special commands, to handle String values like an array of bits: you can set and clear individual bits, count all the bits set to 1, find the first set or unset bit, and so forth. • HyperLogLogs: this is a probabilistic data structure which is used in order to estimate the cardinality of a set. Key Value 12
  • 23. Cluster 13 Master B 5501-11000 Master A 0-5500 Master C 11001-16384 Cluster Bus Every Redis Cluster node requires two TCP connections open. • The normal Redis TCP port used to serve clients. • The high port is used for the Cluster bus, that is a node-to- node communication channel using a binary protocol.
  • 24. Cluster 13 Master B 5501-11000 Master A 0-5500 Master C 11001-16384 Cluster Bus Every Redis Cluster node requires two TCP connections open. • The normal Redis TCP port used to serve clients. • The high port is used for the Cluster bus, that is a node-to- node communication channel using a binary protocol. A given key "foo" is at slot: slot = crc16("foo") mod NUMER_SLOTS
  • 25. Cluster • All nodes are directly connected with a service channel. TCP baseport+4000, example 6379 -> 10379. • Node to Node protocol is binary, optimized for bandwidth and speed. • Clients talk to nodes as usually, using ascii protocol, with minor additions. • Nodes don't proxy queries. 14 Client Client Client
  • 26. Cluster 14 Client Client Client • Dummy Client 1. Client => A: GET foo 2. A => Client: -MOVED 8 192.168.5.21:6391 3. Client => B: GET foo 4. B => Client: "bar" • Smart Client 1. Client => A: CLUSTER HINTS 2. A => Client: ... a map of hash slots -> nodes 3. Client => B: GET foo 4. B => Client: "bar"
  • 29. Master/Slave 15 Master 1st Slave 1st Slave slaveof masterip port
  • 30. Master/Slave 15 Master 1st Slave 2nd Slave 1st Slave 2nd Slave slaveof masterip port
  • 31. Model Flow 16 Client Master Slave 1. Request Async Model Flow
  • 32. Model Flow 16 Client Master Slave 1. Request 2. Response Async Model Flow
  • 33. Model Flow 16 Client Master Slave 1. Request 2. Response 3. Binlog Copy 4. Apply Replicate Async Model Flow
  • 34. Model Flow 17 Sync Model Flow Client Master Slave 1. Request 4. Response 2. Binlog Copy 3. Apply Replicate
  • 35. Model Flow 18 Sem-Sync Model Flow Client Master Slave 1. Request 3. Response 2. Binlog Copy 4. Apply Replicate
  • 36. Cluster & Master/Slave 19 Master A 1, 2, 4 Master B 3, 7, 8 Master C 5, 6, 9, 10
  • 37. Cluster & Master/Slave 19 Master A 1, 2, 4 Master B 3, 7, 8 Master C 5, 6, 9, 10 Slave A1 1, 2, 4 Slave A2 1, 2, 4 Slave B1 3, 7, 8 Slave B2 3, 7, 8 Slave C1 5, 6, 9, 10 Slave C2 5, 6, 9, 10
  • 38. Cluster & Master/Slave 19 Master A 1, 2, 4 Master B 3, 7, 8 Master C 5, 6, 9, 10 Slave A1 1, 2, 4 Slave A2 1, 2, 4 Slave B1 3, 7, 8 Slave B2 3, 7, 8 Slave C1 5, 6, 9, 10 Slave C2 5, 6, 9, 10 Overloaded
  • 39. Cluster & Master/Slave 19 Master A 1, 2, 4 Master B 3, 7, 8 Master C 5, 6, 9, 10 Master D Slave A1 1, 2, 4 Slave A2 1, 2, 4 Slave B1 3, 7, 8 Slave B2 3, 7, 8 Slave C1 5, 6, 9, 10 Slave C2 5, 6, 9, 10 10 Master D 10
  • 40. Cluster & Master/Slave 19 Master A 1, 2, 4 Master B 3, 7, 8 Master C 5, 6, 9, 10 Master D Slave A1 1, 2, 4 Slave A2 1, 2, 4 Slave B1 3, 7, 8 Slave B2 3, 7, 8 Slave C1 5, 6, 9, 10 Slave C2 5, 6, 9, 10 10 Master D 10
  • 41. Cluster & Master/Slave 19 Master A 1, 2, 4 Master B 3, 7, 8 Master C 5, 6, 9, 10 Master D Slave A1 1, 2, 4 Slave A2 1, 2, 4 Slave B1 3, 7, 8 Slave B2 3, 7, 8 Slave C1 5, 6, 9, 10 Slave C2 5, 6, 9, 10 Master C 5, 6, 9 Slave C2 5, 6, 9 Slave C1 5, 6, 9 10 Master D 10
  • 42. Cluster & Master/Slave 20 Master A 1, 2, 4 Master B 3, 7, 8 Master C 5, 6, 9, 10 Master D Slave A1 1, 2, 4 Slave A2 1, 2, 4 Slave B1 3, 7, 8 Slave B2 3, 7, 8 Slave C1 5, 6, 9, 10 Slave C2 5, 6, 9, 10 Master C 5, 6, 9 Slave C2 5, 6, 9 Slave C1 5, 6, 9 Master D 10
  • 43. Cluster & Master/Slave 20 Master A 1, 2, 4 Master B 3, 7, 8 Master C 5, 6, 9, 10 Master D Slave A1 1, 2, 4 Slave A2 1, 2, 4 Slave B1 3, 7, 8 Slave B2 3, 7, 8 Slave C1 5, 6, 9, 10 Slave C2 5, 6, 9, 10 Master C 5, 6, 9 Slave C2 5, 6, 9 Slave C1 5, 6, 9 Master D 10 Failed
  • 44. Cluster & Master/Slave 20 Master A 1, 2, 4 Master C 5, 6, 9, 10 Master D Slave A1 1, 2, 4 Slave A2 1, 2, 4 Slave B2 3, 7, 8 Slave C1 5, 6, 9, 10 Slave C2 5, 6, 9, 10 Master C 5, 6, 9 Slave C2 5, 6, 9 Slave C1 5, 6, 9 Master B 3, 7, 8 Master D 10
  • 45. Cluster & Master/Slave 20 Master A 1, 2, 4 Master C 5, 6, 9, 10 Master D Slave A1 1, 2, 4 Slave A2 1, 2, 4 Slave B2 3, 7, 8 Slave C1 5, 6, 9, 10 Slave C2 5, 6, 9, 10 Master C 5, 6, 9 Slave C2 5, 6, 9 Slave C1 5, 6, 9 Master B 3, 7, 8 Master D 10
  • 46. Example: LINE with Redis 21
  • 47. Example: LINE with Redis 21
  • 54. Pub/Sub 22 Redis Publisher Channel Subscriber Channel Subscriber Channel Subscriber Channel Channel Channel Channel Channel Message: Hello Message: HelloMessage: Hello redis> PSUBSCRIBE news.*
  • 56. Example with Redis in Publish / Subscribe 24
  • 57. Example with Redis in Publish / Subscribe 24
  • 58. Conclusion Redis is a different evolution path in the key-value DBs where values can contain more complex data types, with atomic operations defined on those data types. Redis is an in-memory but persistent on disk database, so it represents a different trade off where very high write and read speed is achieved with the limitation of data sets that can't be larger than memory. 25
  • 59. Memcached • What is Memcached? • Where is it used? • How is it used? • How does it work internally? • A Cluster of Memcached Servers • Memcached and CAP Theorem • Memcached vs Redis 26
  • 60. What is Memcached? Memcached is a free and open-source general-purpose distributed in-memory object caching system intended to be used for web applications. Its primary goal is to cache the data returned from database interactions and/or cache the result of costy operations and API calls. 27
  • 61. History & Characteristics • Originally written in Perl in 2003 for LiveJournal.com, then rewritten in C • First release in May 2003 • Released under Revised BSD License • Platform-independent (Windows, Linux, BSD, UNIX, Mac) • Client APIs in many different languages: • C, C++, Java, PHP, Perl, Python, Ruby, Windows .NET, Lua, … • Used by many giant web sites: Facebook, YouTube, Reddit, Twitter, Wikipedia, Flickr, etc. 28
  • 62. Where is it used? (Architectural view) Memcached resides between Front-end Web tier and the Back-end Database tier. Front-end web tier tries to fetch data from Memcached at first and then interacts with database in case of cache miss. 1 1 2 2 29
  • 63. How is it used? (Conceptual view) From a programmer’s point of view, Memcached is a collection of key-value items stored in memory. Keys are unique identifiers with a possible length of up to 250 characters. Values can have any format including raw binary data and they can contain data up to 1 MB (by default). 30
  • 64. How is it used? (Conceptual view) Example (in Perl): use Cache::Memcached; $cache = new Cache::Memcached { servers => ["192.168.0.10:11211", "192.168.0.20:11211"] }; $user_info = $cache->get("user:$id"); if (!$user_info) { $user_info = new User($id); $cache->set("user:$id", $user_info); } print $user_info->name; 31
  • 65. Some Common Operations set(key, val) add(key, val): like set but only works if the key does not already exist replace(key, val): like set but only works if the key already exists get(key) delete(key) incr(key) decr(key) 32
  • 66. How does it work internally? (Physical view) • Characteristics of data items • Internal data storage and memory management • Cluster topology 33
  • 67. Characteristics of data items • Each data item consists of a key, value, optional flags and an optional expiration time (TTL) • Data is removed from the cache once its TTL has expired or the cache is already full and it is selected by the replacement algorithm • Memcached uses Least Recently Used (LRU) policy to free the cache 34
  • 68. Internal data storage and memory management Memcached has 4 primary internal data structures: A hash table to store and locate cache items A Least Recently Used (LRU) list to specify cache item removal (eviction) when the cache is full A cache item data structure for storing key, flags, data and pointers A slab allocator which handles memory management for cache items 35
  • 69. Hash Table • The hash table data structure is an array whose elements are buckets • Each bucket is a single linked-list of cache items • Each cache item maintains its pointer in its respective bucket 36
  • 70. LRU Data Structure • LRU is used to determine which cache item to evict • LRU is a double linked-list for holding all cache items in a particular slab • Each slab has its own LRU data structure • The LRU pointers are located in each cache item data structure • Each access/modification of a cache item updates its pointers in its LRU • Last item (tail) of each list is the item to be removed in the event of eviction 37
  • 71. Slab Allocator Problems with malloc() and free() functions for memory management: • Poor performance • Memory fragmentation after some time To overcome these problems, Memcached uses a Slab Allocator: • The allowed amount of memory is allocated all at once at startup time • The memory is divided into different classes called slab classes (pages) that are equal in size • Each slab class is again divided into smaller pieces for data storage called slabs (chunks) • Slab allocator takes care of pages and slabs • A cache item will be located in the smallest slab that is bigger than or equal to the item 38
  • 72. Slab Allocator Example: 2048 bytes of allowed memory is split into 4 slab classes (512 bytes each) and each slab class is divided into chunks slab class 1 slab class 2 slab class 3 slab class 4 64 B 64 B 64 B 64 B 64 B 64 B 64 B 64 B 128 B 128 B 128 B 128 B 256 B 256 B 512 B 512 bytes 39
  • 73. Slab Allocator To see the slab classes, run Memcached with -vv option: $ memcached –m 64M –vv Output: slab class 1: chunk size 96 perslab 10922 slab class 2: chunk size 120 perslab 8738 slab class 3: chunk size 152 perslab 6898 slab class 4: chunk size 192 perslab 5461 . . . slab class 42: chunk size 1048576 perslab 1 40
  • 74. Handling Client Requests • Client requests are handled with libevent and assigned to threads in Memcached • Each thread waits to acquire the lock to enter the critical section to do the hash table processing and updating LRU data structure • Then, the thread leaves the critical section and sends back the result to the client. à Every internal operation in Memcached has the cost of O(1). à The critical section is a performance bottleneck since the execution is serial. 41
  • 75. A Cluster of Memcached Servers
 • It is possible to have a number of different Memcached instances on different machines. • The instances are unaware of one another and do not communicate with others. • The client API determines which server to use by applying a hash function on item’s key • Servers are grouped together with consistent hashing. • A server can have a weight to affect its selection (default: 1) • In the event of failure, the client can use other servers Client hash(key) 42
  • 76. Memcached and CAP Theorem
 ✓ Consistency: All clients see the same data at the same time. × Availability: Some clients might not be able to access Memcached at some time. ✓ Partition tolerance: Memcached continues to work in case of arbitrary message loss or failure of other servers. 43
  • 77. Redis vs Memcached Redis Memcached Persistent storage Transient caching In memory, able to persist to disk Only in memory caching Appropriate for complex data structures (hashes, lists, sets, etc) Appropriate for simple key-value pairs Master/Slave replication No support for replication High performance in large volumes of read and write operations High performance in only large volumes of read operations Key length limit: 2 GB! Key length limit: 250 bytes Open source (BSD license) Open source (Revised BSD license) 44
  • 78. “Memory is the new disk, disk is the new tape.” —Jim Gray (1944 – 2007) 45
  • 79. Questions? Thank you for your attention. 46