Successfully reported this slideshow.
Your SlideShare is downloading. ×

Dynamo and BigTable - Review and Comparison

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Upcoming SlideShare
Bigtable and Dynamo
Bigtable and Dynamo
Loading in …3
×

Check these out next

1 of 17 Ad
Advertisement

More Related Content

Slideshows for you (20)

Viewers also liked (20)

Advertisement

Similar to Dynamo and BigTable - Review and Comparison (20)

Recently uploaded (20)

Advertisement

Dynamo and BigTable - Review and Comparison

  1. 1. Dynamo and BigTable Review and Comparison IEEEI 2014 Grisha Weintraub
  2. 2. Outline • Introduction to NoSQL • Introduction to Dynamo and BigTable • Dynamo vs. BigTable comparison • Open source implementations
  3. 3. Introduction to NoSQL • New generation of databases • Response to a “big data” challenge • Main characteristics: – Non-relational – Distributed – Fault tolerant – Scalable
  4. 4. Introduction to NoSQL
  5. 5. Dynamo and BigTable - Introduction Dynamo (Amazon) • Giuseppe DeCandia, et al.: Dynamo: amazon's highly available key-value store. SOSP 2007 BigTable (Google) • Fay Chang, et al.: BigTable: A Distributed Storage System for Structured Data. OSDI 2006 Highly Available Key-value Structured Data
  6. 6. Dynamo vs. BigTable BigTableDynamo Architecture Data model API Security Partitioning Replication Storage Membership and failure detection
  7. 7. Architecture Dynamo • Decentralized: – Every node has the same set of responsibilities as its peers. – There is no single point of failure. BigTable • Centralized: – Single master node maintains all system metadata. – Other nodes (tablet servers) handle read and write requests. Master
  8. 8. Data Model Dynamo • Key-value - data is stored as <key, value> pairs, such that key is a unique identifier and a value is an arbitrary entry. BigTable • Multidimensional sorted map – map is indexed by a row key and a column key, and ordered by a row key. Column keys are grouped into sets called column families. ValueKey { “Name” : ”John”, “Email” : ”john@g.com”, “Card” : ”6652” } 188 { “Name” : ”Bob”, “Phone” : ”781455”, “Card” : ”9875” } 145 Financial DataPersonal DataUser ID Card = “9875”Name = "Bob"Phone = "781455"145 Card = “6652”Name = "John"Email = "john@g.com"188 row key column family column key
  9. 9. API Dynamo • get – returns an object associated with the given key. • put – associates the given object with the specified key. BigTable • get – returns values from the individual rows. • scan – iterates over multiple rows. • put – inserts a value to the specified table's cell. • delete – deletes a whole row or a specified cell inside a particular row.
  10. 10. Security Dynamo • No security features BigTable • Access control rights are granted at column family level. Financial DataPersonal DataRow Key Card = “9875”Name = "Bob"Phone = "781455"145 Card = “6652”Name = "John"Email = "john@g.com"188 Views Personal Data Views/Updates Personal Data Views/Updates all the Data
  11. 11. Partitioning Dynamo • Consistent Hashing: – Each node is assigned to a random position on the ring. – Key is hashed to the fixed point on the ring. – Node is chosen by walking clockwise from the hash location. BigTable • Data is stored ordered by a row key. • Each table consists of a set of tablets. • Each tablet is assigned to exactly one tablet server. • METADATA table stores the location of a tablet under a row key. A B DE F G hash(key) C …..id …..15000 Tablet 1 …..…. …..20000 …..20001 Tablet 2 …..…. …..25000 Tablet-51Tablet-11 Tablet-32Tablet-7 Tablet-16Tablet-8 Tablet-1Tablet-21 Tablet Server 1 Tablet Server 2
  12. 12. Replication Dynamo • Each data item is replicated at N nodes (N is a user-defined parameter). • Each key K is assigned to a coordinator node. • Coordinator stores the data associated with K locally, and also replicates it at the N-1 healthy clockwise successor nodes in the ring. BigTable • Each tablet is stored in GFS as a sequence of read-only files called SSTables. • SSTables are divided into fixed-size chunks, and these chunks are stored on chunkservers. • Each chunk in GFS is replicated across multiple chunkservers. N = 3 A B DE F G hash(key) C SSTable3SSTable2SSTable1 Chunk3Chunk2Chunk1 Chunk1 Chunk3 Chunk1 Chunk2 Chunkserver 1 Chunkserver 2 Chunk2 Chunk3 Chunkserver 3
  13. 13. Storage Dynamo • Each node in Dynamo has a local persistence engine where data items are stored as binary objects. • Different Dynamo instances may use different persistence engines (e.g. MySql, BDB) • Applications choose the persistence engine based on their object size distribution. BigTable • Data is stored in GFS in SSTable file format. • SSTable is an immutable ordered map, whose keys and values are arbitrary strings. • SSTable supports "get by key" and "get by key range" requests.
  14. 14. Membership and Failure detection Dynamo • Gossip-based protocol: – Each node contacts a peer chosen at random every second and the two nodes exchange their membership data (every node maintains a persistent view of the membership). BigTable • Failed tablet servers are identified by regular handshakes between the master and all tablet servers. A B DE F G C Master
  15. 15. Dynamo vs. BigTable BigTableDynamo centralizeddecentralizedArchitecture sorted mapkey-valueData model get, put, scan, deleteget, putAPI access controlnoSecurity key range basedconsistent hashingPartitioning chunkservers in GFS successor nodes in the ring Replication SSTables in GFSPlug-inStorage Handshakes initiated by master Gossip-based protocol Membership and failure detection
  16. 16. Open source implementations
  17. 17. Thank You

×