Memory is the new disk, disk is the new tape, Bela Ban (JBoss by RedHat)

Memory is the new disk,
disk is the new tape

Bela Ban, JBoss / Red Hat

Motivation
● We want to store our data in memory
– Memory access is faster than disk access
– Even across a network
– A DB requires network communication, too
● The disk is used for archival purposes
● Not a replacement for DBs !
– Only a key-value store
– NoSQL

Problems
● #1: How do we provide memory large
enough to store the data (e.g. 2 TB of
memory) ?
● #2: How do we guarantee persistence ?
– Survival of data between reboots / crashes

#1: Large memory
● We aggregate the memory of all nodes in a
cluster into a large virtual memory space
– 100 nodes of 10 GB == 1 TB of virtual
memory

#2: Persistence
● We store keys redundantly on multiple
nodes
– Unless all nodes on which key K is stored
crash at the same time, K is persistent
● We can also store the data on disk
– To prevent data loss in case all cluster
nodes crash
– This can be done asynchronously, on a
background thread

How do we provide redundancy ?

Store every key on every node
A B C D
K1 K1 K1 K1
K2 K2 K2 K2
K3 K3 K3 K3
K4 K4 K4 K4

● RAID 1
● Pro: data is available everywhere
– No network round trip
– Data loss only when all nodes crash
● Con: we can only use 25% of our memory

Store every key on 1 node only
A B C D
K1 K2 K3 K4

● RAID 0, JBOD
● Pro: we can use 100% of our memory
● Con: data loss on node crash
– No redundancy

Store every key on K nodes
A B C D
K1 K1
K2 K2
K3 K3
K4 K4

● K is configurable (2 in the example)
● Variable RAID
● Pro: we can use a variable % of our memory
– User determines tradeoff between memory
consumption and risk of data loss

So how do we determine on which nodes the
keys are stored ?

Consistent hashing
● Given a key K and a set of nodes, CH(K)
will always pick the same node P for K
– We can also pick a list {P,Q} for K
● Anyone 'knows' that K is on P
● If P leaves, CH(K) will pick another node Q
and rebalance affected keys
● A good CH will rebalance 1/N keys at most
(where N = number of cluster nodes)

Example
A B C D
K1 K1
K2 K2
K3 K3
K4 K4

● K2 is stored on B (primary owner) and C
(backup owner)

Example
A B C D
K1 K1
K2 K2
K3 K3
K4 K4

● Node B now crashes

Example
A B C D
K1 K1 K1
K2 K2 K2
K3 K3
K4 K4

● C (the backup owner of K2) copies K2 to D
– C is now the primary owner of K2
● A copies K1 to C
– C is now the backup owner of K1

Rebalancing
● Unless all N owners of a key K crash
exactly at the same time, K is always
stored redundantly
● When less than N owners crash,
rebalancing will copy/move keys to other
nodes, so that we have N owners again

Enter ReplCache
● ReplCache is a distributed hashmap
spanning the entire cluster
● Operations: put(K,V), get(K), remove(K)
● For every key, we can define how many
times we'd like it to be stored in the cluster
– 1: RAID 0
– -1: RAID 1
– N: variable RAID

Use of ReplCache

JBoss ReplCache

Servlet

Apache JBoss ReplCache
Cluster
HTTP Servlet
mod_jk

JBoss ReplCache

Servlet

DB

Use cases
● JBoss AS: session distribution using
Infinispan
– For data scalability, sessions are stored
only N times in a cluster
● GridFS (Infinispan)
– I/O over grid
– Files are chunked into slices, each slice is
stored in the grid (redundantly if needed)
– Store a 4GB DVD in a grid where each
node has only 2GB of heap

Use cases
● Hibernate Over Grid (OGM)
– Replaces DB backend with Infinispan
backed grid

Conclusion
● Given enough nodes in a cluster, we can
provide persistence for data
● Unlike RAID, where everything is stored
fully redundantly (even /tmp), we can
define persistence guarantees per key
● Ideal for data sets which need to be
accessed quickly
– For the paranoid we can still stream to disk

Conclusion
● Data is distributed over a grid
– Cache is closer to clients
– No bottleneck to the DBMS
– Keys are on different nodes

Conclusion

Client Client Client Client Client Client
Client Client Client Client Client Client

Cache Cache Cache
Cache Cache Cache

Cache Cache Cache
Cache Cache Cache

Cache
Client Client Client Cache
Client Client Client

Questions ?
● Demo (JGroups)
– http://www.jgroups.org
● Infinispan
– http://www.infinispan.org
● OGM
– http://community.jboss.org/en/hibernate/ogm

Memory is the new disk, disk is the new tape, Bela Ban (JBoss by RedHat)

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (8)

Similar to Memory is the new disk, disk is the new tape, Bela Ban (JBoss by RedHat)

Similar to Memory is the new disk, disk is the new tape, Bela Ban (JBoss by RedHat) (20)

More from OpenBlend society

More from OpenBlend society (13)

Recently uploaded

Recently uploaded (20)

Memory is the new disk, disk is the new tape, Bela Ban (JBoss by RedHat)