Dynamo and BigTable - Review and Comparison

Dynamo and BigTable
Review and Comparison
IEEEI 2014
Grisha Weintraub

Outline
• Introduction to NoSQL
• Introduction to Dynamo and BigTable
• Dynamo vs. BigTable comparison
• Open source implementations

Introduction to NoSQL
• New generation of databases
• Response to a “big data” challenge
• Main characteristics:
– Non-relational
– Distributed
– Fault tolerant
– Scalable

Dynamo and BigTable - Introduction
Dynamo (Amazon)
• Giuseppe DeCandia, et al.:
Dynamo: amazon's highly available
key-value store. SOSP 2007
BigTable (Google)
• Fay Chang, et al.: BigTable: A
Distributed Storage System for
Structured Data. OSDI 2006
Highly Available
Key-value Structured Data

Dynamo vs. BigTable
BigTableDynamo
Architecture
Data model
API
Security
Partitioning
Replication
Storage
Membership and failure
detection

Architecture
Dynamo
• Decentralized:
– Every node has the same set of
responsibilities as its peers.
– There is no single point of
failure.
BigTable
• Centralized:
– Single master node maintains
all system metadata.
– Other nodes (tablet servers)
handle read and write
requests.
Master

Data Model
Dynamo
• Key-value - data is stored as
<key, value> pairs, such that
key is a unique identifier and a
value is an arbitrary entry.
BigTable
• Multidimensional sorted map
– map is indexed by a row key
and a column key, and ordered
by a row key. Column keys are
grouped into sets called
column families.
ValueKey
{
“Name” : ”John”,
“Email” : ”john@g.com”,
“Card” : ”6652”
}
188
{
“Name” : ”Bob”,
“Phone” : ”781455”,
“Card” : ”9875”
}
145
Financial DataPersonal DataUser ID
Card = “9875”Name = "Bob"Phone = "781455"145
Card = “6652”Name = "John"Email = "john@g.com"188
row key column family
column key

API
Dynamo
• get – returns an object
associated with the given
key.
• put – associates the given
object with the specified
key.
BigTable
• get – returns values from
the individual rows.
• scan – iterates over multiple
rows.
• put – inserts a value to the
specified table's cell.
• delete – deletes a whole
row or a specified cell inside
a particular row.

Security
Dynamo
• No security features
BigTable
• Access control rights are
granted at column family level.
Financial DataPersonal DataRow
Key
Card = “9875”Name = "Bob"Phone = "781455"145
Card = “6652”Name = "John"Email = "john@g.com"188
Views Personal Data
Views/Updates Personal Data
Views/Updates all the Data

Partitioning
Dynamo
• Consistent Hashing:
– Each node is assigned to a random
position on the ring.
– Key is hashed to the fixed point on the
ring.
– Node is chosen by walking clockwise from
the hash location.
BigTable
• Data is stored ordered by a row key.
• Each table consists of a set of tablets.
• Each tablet is assigned to exactly one
tablet server.
• METADATA table stores the location of a
tablet under a row key.
A
B
DE
F
G
hash(key)
C
…..id
…..15000
Tablet 1 …..….
…..20000
…..20001
Tablet 2 …..….
…..25000
Tablet-51Tablet-11
Tablet-32Tablet-7
Tablet-16Tablet-8
Tablet-1Tablet-21
Tablet Server 1 Tablet Server 2

Replication
Dynamo
• Each data item is replicated at N nodes
(N is a user-defined parameter).
• Each key K is assigned to a coordinator
node.
• Coordinator stores the data associated
with K locally, and also replicates it at
the N-1 healthy clockwise successor
nodes in the ring.
BigTable
• Each tablet is stored in GFS as a
sequence of read-only files called
SSTables.
• SSTables are divided into fixed-size
chunks, and these chunks are stored on
chunkservers.
• Each chunk in GFS is replicated across
multiple chunkservers.
N = 3
A
B
DE
F
G
hash(key)
C
SSTable3SSTable2SSTable1
Chunk3Chunk2Chunk1
Chunk1
Chunk3
Chunk1
Chunk2
Chunkserver 1 Chunkserver 2
Chunk2
Chunk3
Chunkserver 3

Storage
Dynamo
• Each node in Dynamo has a
local persistence engine where
data items are stored as binary
objects.
• Different Dynamo instances
may use different persistence
engines (e.g. MySql, BDB)
• Applications choose the
persistence engine based on
their object size distribution.
BigTable
• Data is stored in GFS in SSTable
file format.
• SSTable is an immutable
ordered map, whose keys and
values are arbitrary strings.
• SSTable supports "get by key"
and "get by key range"
requests.

Membership and Failure detection
Dynamo
• Gossip-based protocol:
– Each node contacts a peer chosen
at random every second and the
two nodes exchange their
membership data (every node
maintains a persistent view of the
membership).
BigTable
• Failed tablet servers are
identified by regular handshakes
between the master and all tablet
servers.
A
B
DE
F
G
C
Master

Dynamo vs. BigTable
BigTableDynamo
centralizeddecentralizedArchitecture
sorted mapkey-valueData model
get, put, scan, deleteget, putAPI
access controlnoSecurity
key range basedconsistent hashingPartitioning
chunkservers in GFS
successor nodes in the
ring
Replication
SSTables in GFSPlug-inStorage
Handshakes initiated by
master
Gossip-based protocol
Membership and failure
detection

Dynamo and BigTable - Review and Comparison

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Dynamo and BigTable - Review and Comparison

Similar to Dynamo and BigTable - Review and Comparison (20)

Recently uploaded

Recently uploaded (20)

Dynamo and BigTable - Review and Comparison