SlideShare a Scribd company logo
MegaStore
Google Inc.

Jason Baker, Chris Bond, James C Corbett, JJ Furman,
Andrey Khorlin, James Larson, Jean-Michel Leon, Yawei Li,
Alexander Lloyd, Vadim Yushprakh. CIDR 2011.

Presented by: Noha Elprince
22 June, 2011
What is MegaStore?
§  A storage system developed to meet the
requirements of today’s online interactive services.

§  Megastore is the data engine supporting the Google
App Engine (GAE) https://appengine.google.com/

§  GAE cloud computing technology:
Hosts/virtualizes web apps across multiple
servers on Google’s platform.
Ø  Fast development and deployment.
Ø  Simple administration.
Ø  No need to worry about hardware
patches or backups and scalability.
Ø 

2
Outline
—  Motivation & Problem
—  Methodology
—  Design of Megastore
— 
— 
— 

Data Model
Data Storage
Transactions and Concurrency Control

—  How Megastore achieves Availability and Scalability.
— 
— 

PAXOS.
Megastore’s approach.

—  Experience
—  Related Work
—  Conclusion

3
Megastore- Motivation
•  Storage requirements of today’s interactive online
applications.
— 
— 
— 
— 
— 

Highly scalable
Rapid development
Low latency
Durability and consistency
Availability and fault tolerance.

•  These requirements are in conflict !

4
CAP Theorem – Eric Brewer 2000
“In a distributed database system,
you can only have at most two of
the following three characteristics:

Ø  Consistency
Ø  Availability
Ø  Partition tolerance
”
ACID = Atomicity, Consistency,
Isolation, Durability.

5
Problem
§  Conflicts between Available systems:
—  RDBMS
Rich set of features, expressive language helps development,
but difficult to scale.
Eg: MySQL, PostgreSQL, MS SQL Server, Oracle RDB.

—  NoSQL datastores
Highly Scalable but Limited API and loose consistency models.
Eg: Google’s BigTable, Apache Hadoop’s Hbase, Facebook’s Cassandra.

§ 

Reliability of a single datacenter cant be guaranteed 100%.
[“Always expect the unexpected”—James Patterson]

6
Methodology
—  Megastore blends the scalability of NoSQL with the
convenience of traditional RDBMS.

—  High reliability can be achieved by:
Ø  Data lives in multiple data centers.
Ø  Write to a majority of datacenters synchronously.
Ø  Allow the infrastructure decide what datacenter to read from and
write to.

7
Outline
þ —  Motivation & Problem
þ —  Methodology

—  Design of Megastore
— 
— 
— 

Data Model
Data Storage
Transactions and Concurrency Control

—  How Megastore achieves Availability and Scalability.
— 
— 

PAXOS.
Megastore’s approach.

—  Experience
—  Related Work
—  Conclusion

8
Design of Megastore : DataModel
—  The data model is declared in a schema.
—  Each schema has a set of tables : root tables or child tables.
—  Entity Group – consists of a root entity along with all child
entities.

CREATE SCHEMA PhotoApp;
CREATE TABLE User {
required int64 user_id;
required string name;
} PRIMARY KEY(user_id),
ENTITY GROUP ROOT;

CREATE TABLE Photo {
required int64 user_id;
required int32 photo_id;
required int64 time;
required string full_url;
optional string thumbnail_url;
repeated string tag;
} PRIMARY KEY(user_id, photo_id),
IN TABLE User,
ENTITY GROUP KEY(user_id)
REFERENCES User;

9
Design of Megastore : DataModel
•  (Hierarchical) data is de-normalized to eliminate the join costs
Joins are implemented in application level
•  Outer joins with parallel queries using secondary indexed
•  Provides an efficient stand-in for SQL-style joins

10
Design of Megastore : Data Storage
How is it stored in BigTable?

“A Bigtable is a compressed, high performance, and proprietary database
system built on :
Google File System (GFS), Chubby Lock service and other Google
programs ”

11
Design of Megastore : Data Storage
Example:
User {user_id:101, name: ‘John’ }
Photo{ user_id:101, photo_id:501, time 2009, full_url:
‘john-pic1’,

Row
Key
101

User.na Photo.
me
time

User{user_id:102, name: ‘Mary’ }
Photo{ user_id:102, photo_id:600, time:2009,
full_url: ‘mary-pic1’, tag:’office’, tag:’picnic’,
tag:’Paris’}
Photo{ user_id:102, photo_id:601, time:2011,
full_url: ‘mary-pic2’, tag:’birthday’, tag:’friends’}

Photo
URL

John

101,
501

2009

Vacation,
Hoilday,
Paris

…

101,
502

2010

Office,
friends, pub

…

102,
600

2009

Office,
Picnic,
Paris

…

102,
601

2011

Birthday,
Friends

…

tag:’vacation’, tag:’holiday’, tag:’Paris’}
Photo{ user_id:101, photo_id:502, time:2010, full_url:
‘john-pic2’, tag:’office’,
tag:’friends’, tag:’pub’}

Photo.
Tag

102

Mary

12
Design of Megastore : Data Storage
—  Indexing
—  Local Index – find data within Entity Group.
CREATE LOCAL INDEX PhotosByTime ON Photo(user_id, time);

—  Global Index - spans entity groups.
CREATE GLOBAL INDEX PhotosByTag ON Photo(tag) STORING
(thumbnail_url);

—  The ‘Storing’ Clause
Ø  Faster retrieval of certain properties.

13
Design of Megastore : Data Storage
How is it stored in BigTable?
PhotosByTime
Row Key
101,2009, 101,501
101,2010, 101,502
102,2009, 102,600
102,2011, 102,601

PhotosByTag
Row Key

Thumbnail.Url

Birthday,102, 601

…

Friends, 101, 502

…

Friends, 102,601

…

Holiday, 101, 501

…

Office, 101, 502

…

Office, 102, 600

…

Paris, 101, 501

…

Paris, 102, 600

…

Pub, 101, 502

…

14
Outline
þ —  Motivation & Problem
þ —  Methodology
þ —  Design of Megastore
✓ —  Data Model
✓ —  Data Storage
— 

Transactions and Concurrency Control

—  How Megastore achieves Availability and Scalability.
— 
— 

PAXOS.
Megastore’s approach.

—  Experience
—  Related Work
—  Conclusion

15
Transactions and Concurrency Control
•  Each Entity Group acts as mini-db, provides
ACID semantics.

•  Transaction management using Write
Ahead Logging (WAL).

•  BigTable feature – ability to store multiple
data for same row/column with different
timestamps.

•  Cross entity group transactions supported
via two-phase commit (2PC).

•  Entites in an Entity group employs
Multiversion Concurrency Control (MVCC).
Transactions and Concurrency Control
—  MVCC: multiversion concurrency control
Using timestamps - reads and writes do not block each other.

—  Read consistency
—  Current: wait for uncommitted writes then read last committed value
—  Snapshot: doesn't’t wait. Reads last committed values.
—  Inconsistent reads: ignore the state of log and read the last values directly
(data may be stale)

—  Write consistency
—  Determine the next available log position
—  Assigns mutations of write-ahead log (WAL) a timestamp higher than any
previous one

—  Employs Paxos to settle the resource contention : Select a winner to write on
a certain entity group. The others will abort/retry their operations.
It uses optimistic concurrency OCC with mutations (write operations):
(Assumes there is no transaction ‘s data conficts => proceed without locks )
Transactions and Concurrency Control
q  Queues
§  Provide transactional messaging
between entity groups.
§  Each message either is :
Ø  Synchronous: has a single
sending and receiving entity group.
Ø  Asynchronous: has different
sending and receiving entity group.
Fig. Operations across entity groups

Ø  Useful to perform operations that affect many entity groups.

18
Transactions and Concurrency Control
q  Two-Phase Commit (2PC)
§  Coordinator: the component that receives the commit/abort request
§  Participants: the resource managers that did work on behalf of
the transaction (by reading/updating resources).
* Goal: Ensure that the coordinator and all participants either
commit/abort the transaction => Atomicity is satisfied. Source: Ref[2]

Disadv. High latency
Adv.
Simplify code for unique secondary key enforcement.

19
Other Features
—  Integrated Backup System
Ø  used to restore back an entity group’s state to
any point in time

—  Data Encryption
Ø  use distinct key/entity group

20
Outline
þ —  Motivation & Problem
þ —  Methodology
þ —  Design of Megastore
✓ —  Data Model
✓ —  Data Storage
✓ —  Transactions and Concurrency Control

—  How Megastore achieves Availability and Scalability.
— 
— 

PAXOS.
Megastore’s approach.

—  Experience
—  Related Work
—  Conclusion

21
Megastore – Availability / Scalability
v  Megastore Replication System
•  Replication is done per entity group by:
synchronously replicating the group’s
transaction log into a number of replicas.
•  Reads and writes can be initiated
from any replicas.
•  Writes require one round of interdatacenter communication.
•  ACID semantics are preserved
regardless of what replica a client
starts from.

Fig. Scalable Replication
Megastore – Replication
—  PAXOS Algorithm
•  a way to reach consensus among a group of replicas on a single value.
•  Databases typically use PAXOS to replicate a transaction log, where a
separate instance of PAXOS is used for each position in the log.

Source: Ref[3]

Adv. Tolerates delayed or reordered messages and replicas that fail by
Stopping (can tolerate upto N/2 failures).
Disadv. high-latency bec. it demands multiple rounds of communication.
so Megastore uses an improved version.
Megastore – Replication
•  Master-Based Approach
Ø  A Master-Slave model is generally used where the Master
handles all the replication of writes.
Ø  But it causes a bottleneck.
Megastore – Replication
•  MegaStore Replication System (PAXOS-modified)
§  Fast Reads

-

Allow local reads from any where.

- Tracks a set of entity groups for which its replica has observed
all PAXOS writes and serve their local reads.

§  Fast Writes
- A specific replica is chosen as a leader.
- The leader decides the proposal no. and sends it to other writers.
- The first writer submits a value to the leader, wins the
right to ask all replicas to accept that value.
•  Select the next write’s leader using the closest replica heuristic
(aim: minimizes the writer-leader latency by observing: most
apps submit writes from the same region repeatedly).
Outline
þ —  Motivation & Problem
þ —  Methodology
þ —  Design of Megastore
✓ —  Data Model
✓ —  Data Storage
✓ —  Transactions and Concurrency Control
þ —  How Megastore achieves Availability and Scalability.
— 
— 

PAXOS.
Megastore’s approach.

—  Experience
—  Related Work
—  Conclusion

26
Experience	
²  Real-world deployment
—  More than 100 production application use Megastore

(e.g. Google App Engine)
—  Most of applications see extremely high availability
—  Most of users see average write latencies of 100~400 ms.
Related Work	
—  NoSQL data storage systems
—  Bigtable, Cassandra, Yahoo PNUTS, Amazon SimpleDB

—  Data replication process
—  Hbase, CouchDB, Dynamo, …
—  Extend replication scheme of traditional RDBMS
systems

—  Paxos algorithm
—  SCALARIS, Keyspace, …
—  Few have used Paxos to achieve synchronous replication
Conclusion
Megastore
Ø  A scalable, highly available datastore for interactive
internet services.
Ø  Paxos is used for synchronous replication.
Ø  Bigtable as the scalable datastore while adding richer
primitives (ACID, Indexes).
Ø  Has over 100 applications in productions	

29
Megastore
Any Questions?
References
—  [1] “Megastore: Providing Scalable Highly Available Storage for
Interactive Services.” Jason Baker et al.. CIDR 2011.

—  [2] “Principles of transaction Processing.”

Philip A. Bernstein, Eric Newcomer, Morgan Kaufmann, 2009.

—  [3] http://paprika.umw.edu/~ernie/cpsc321/10312006.html
—  [4] Google MegaStore’s Presentation at SIGMOD 2008.
http://perspectives.mvdirona.com/2008/07/10/
GoogleMegastore.aspx.

31
Megastore – Replication
Megastore Read Process
—  Each replica stores mutations
and metadata for the log entries
—  Read process
—  1. Query Local
— 
—  2.
— 
— 
—  3.
— 

Up-to-date check

Find position
Highest log position
Select replica

Catchup
Check the consensus
value from other
replica

—  4. Validate
—  Synchronizing with
up-to-data

—  5. Query data
—  Read data with timestamp
Megastore – Replication
—  Megastore Write Process
— 
— 

Each replica stores mutations
and metadata for the log entries
Write process
—  1. Accept leader
— 

Ask the leader to accept
the value as proposal
number

—  2. Prepare
— 

Run the Paxos Prepare
phase at all replica

—  3. Accept
— 

Ask remaining replicas
to accept the value

—  4. Invalidate
— 

Fault handling for replicas
which did not accept the value

—  5. Apply
— 

Apply the value’s mutation at
as many replicas as possible

More Related Content

What's hot

MySQL Baics - Texas Linxufest beginners tutorial May 31st, 2019
MySQL Baics - Texas Linxufest beginners tutorial May 31st, 2019MySQL Baics - Texas Linxufest beginners tutorial May 31st, 2019
MySQL Baics - Texas Linxufest beginners tutorial May 31st, 2019
Dave Stokes
 
Geek Nights Hong Kong
Geek Nights Hong KongGeek Nights Hong Kong
Geek Nights Hong Kong
Rahul Gupta
 
Schema migrations in no sql
Schema migrations in no sqlSchema migrations in no sql
Schema migrations in no sql
Dr-Dipali Meher
 
Polyglot Database - Linuxcon North America 2016
Polyglot Database - Linuxcon North America 2016Polyglot Database - Linuxcon North America 2016
Polyglot Database - Linuxcon North America 2016
Dave Stokes
 
Polyglot Persistence
Polyglot Persistence Polyglot Persistence
Polyglot Persistence
Dr-Dipali Meher
 
What Your Database Query is Really Doing
What Your Database Query is Really DoingWhat Your Database Query is Really Doing
What Your Database Query is Really Doing
Dave Stokes
 
New Security Features in Apache HBase 0.98: An Operator's Guide
New Security Features in Apache HBase 0.98: An Operator's GuideNew Security Features in Apache HBase 0.98: An Operator's Guide
New Security Features in Apache HBase 0.98: An Operator's Guide
HBaseCon
 
Distributed caching with java JCache
Distributed caching with java JCacheDistributed caching with java JCache
Distributed caching with java JCache
Kasun Gajasinghe
 
Develop PHP Applications with MySQL X DevAPI
Develop PHP Applications with MySQL X DevAPIDevelop PHP Applications with MySQL X DevAPI
Develop PHP Applications with MySQL X DevAPI
Dave Stokes
 
DataStax | Data Science with DataStax Enterprise (Brian Hess) | Cassandra Sum...
DataStax | Data Science with DataStax Enterprise (Brian Hess) | Cassandra Sum...DataStax | Data Science with DataStax Enterprise (Brian Hess) | Cassandra Sum...
DataStax | Data Science with DataStax Enterprise (Brian Hess) | Cassandra Sum...
DataStax
 
Consistency in NoSQL
Consistency in NoSQLConsistency in NoSQL
Consistency in NoSQL
Dr-Dipali Meher
 
Hazelcast 101
Hazelcast 101Hazelcast 101
Hazelcast 101
Emrah Kocaman
 
MySQL 8.0 Features -- Oracle CodeOne 2019, All Things Open 2019
MySQL 8.0 Features -- Oracle CodeOne 2019, All Things Open 2019MySQL 8.0 Features -- Oracle CodeOne 2019, All Things Open 2019
MySQL 8.0 Features -- Oracle CodeOne 2019, All Things Open 2019
Dave Stokes
 
NoSQL Databases: An Introduction and Comparison between Dynamo, MongoDB and C...
NoSQL Databases: An Introduction and Comparison between Dynamo, MongoDB and C...NoSQL Databases: An Introduction and Comparison between Dynamo, MongoDB and C...
NoSQL Databases: An Introduction and Comparison between Dynamo, MongoDB and C...
Vivek Adithya Mohankumar
 
Design Patterns for Distributed Non-Relational Databases
Design Patterns for Distributed Non-Relational DatabasesDesign Patterns for Distributed Non-Relational Databases
Design Patterns for Distributed Non-Relational Databases
guestdfd1ec
 
Bigdata netezza-ppt-apr2013-bhawani nandan prasad
Bigdata netezza-ppt-apr2013-bhawani nandan prasadBigdata netezza-ppt-apr2013-bhawani nandan prasad
Bigdata netezza-ppt-apr2013-bhawani nandan prasad
Bhawani N Prasad
 
Web session replication with Hazelcast
Web session replication with HazelcastWeb session replication with Hazelcast
Web session replication with Hazelcast
Emrah Kocaman
 
Run Cloud Native MySQL NDB Cluster in Kubernetes
Run Cloud Native MySQL NDB Cluster in KubernetesRun Cloud Native MySQL NDB Cluster in Kubernetes
Run Cloud Native MySQL NDB Cluster in Kubernetes
Bernd Ocklin
 
Big Data: Big SQL web tooling (Data Server Manager) self-study lab
Big Data:  Big SQL web tooling (Data Server Manager) self-study labBig Data:  Big SQL web tooling (Data Server Manager) self-study lab
Big Data: Big SQL web tooling (Data Server Manager) self-study lab
Cynthia Saracco
 
Dutch PHP Conference 2021 - MySQL Indexes and Histograms
Dutch PHP Conference 2021 - MySQL Indexes and HistogramsDutch PHP Conference 2021 - MySQL Indexes and Histograms
Dutch PHP Conference 2021 - MySQL Indexes and Histograms
Dave Stokes
 

What's hot (20)

MySQL Baics - Texas Linxufest beginners tutorial May 31st, 2019
MySQL Baics - Texas Linxufest beginners tutorial May 31st, 2019MySQL Baics - Texas Linxufest beginners tutorial May 31st, 2019
MySQL Baics - Texas Linxufest beginners tutorial May 31st, 2019
 
Geek Nights Hong Kong
Geek Nights Hong KongGeek Nights Hong Kong
Geek Nights Hong Kong
 
Schema migrations in no sql
Schema migrations in no sqlSchema migrations in no sql
Schema migrations in no sql
 
Polyglot Database - Linuxcon North America 2016
Polyglot Database - Linuxcon North America 2016Polyglot Database - Linuxcon North America 2016
Polyglot Database - Linuxcon North America 2016
 
Polyglot Persistence
Polyglot Persistence Polyglot Persistence
Polyglot Persistence
 
What Your Database Query is Really Doing
What Your Database Query is Really DoingWhat Your Database Query is Really Doing
What Your Database Query is Really Doing
 
New Security Features in Apache HBase 0.98: An Operator's Guide
New Security Features in Apache HBase 0.98: An Operator's GuideNew Security Features in Apache HBase 0.98: An Operator's Guide
New Security Features in Apache HBase 0.98: An Operator's Guide
 
Distributed caching with java JCache
Distributed caching with java JCacheDistributed caching with java JCache
Distributed caching with java JCache
 
Develop PHP Applications with MySQL X DevAPI
Develop PHP Applications with MySQL X DevAPIDevelop PHP Applications with MySQL X DevAPI
Develop PHP Applications with MySQL X DevAPI
 
DataStax | Data Science with DataStax Enterprise (Brian Hess) | Cassandra Sum...
DataStax | Data Science with DataStax Enterprise (Brian Hess) | Cassandra Sum...DataStax | Data Science with DataStax Enterprise (Brian Hess) | Cassandra Sum...
DataStax | Data Science with DataStax Enterprise (Brian Hess) | Cassandra Sum...
 
Consistency in NoSQL
Consistency in NoSQLConsistency in NoSQL
Consistency in NoSQL
 
Hazelcast 101
Hazelcast 101Hazelcast 101
Hazelcast 101
 
MySQL 8.0 Features -- Oracle CodeOne 2019, All Things Open 2019
MySQL 8.0 Features -- Oracle CodeOne 2019, All Things Open 2019MySQL 8.0 Features -- Oracle CodeOne 2019, All Things Open 2019
MySQL 8.0 Features -- Oracle CodeOne 2019, All Things Open 2019
 
NoSQL Databases: An Introduction and Comparison between Dynamo, MongoDB and C...
NoSQL Databases: An Introduction and Comparison between Dynamo, MongoDB and C...NoSQL Databases: An Introduction and Comparison between Dynamo, MongoDB and C...
NoSQL Databases: An Introduction and Comparison between Dynamo, MongoDB and C...
 
Design Patterns for Distributed Non-Relational Databases
Design Patterns for Distributed Non-Relational DatabasesDesign Patterns for Distributed Non-Relational Databases
Design Patterns for Distributed Non-Relational Databases
 
Bigdata netezza-ppt-apr2013-bhawani nandan prasad
Bigdata netezza-ppt-apr2013-bhawani nandan prasadBigdata netezza-ppt-apr2013-bhawani nandan prasad
Bigdata netezza-ppt-apr2013-bhawani nandan prasad
 
Web session replication with Hazelcast
Web session replication with HazelcastWeb session replication with Hazelcast
Web session replication with Hazelcast
 
Run Cloud Native MySQL NDB Cluster in Kubernetes
Run Cloud Native MySQL NDB Cluster in KubernetesRun Cloud Native MySQL NDB Cluster in Kubernetes
Run Cloud Native MySQL NDB Cluster in Kubernetes
 
Big Data: Big SQL web tooling (Data Server Manager) self-study lab
Big Data:  Big SQL web tooling (Data Server Manager) self-study labBig Data:  Big SQL web tooling (Data Server Manager) self-study lab
Big Data: Big SQL web tooling (Data Server Manager) self-study lab
 
Dutch PHP Conference 2021 - MySQL Indexes and Histograms
Dutch PHP Conference 2021 - MySQL Indexes and HistogramsDutch PHP Conference 2021 - MySQL Indexes and Histograms
Dutch PHP Conference 2021 - MySQL Indexes and Histograms
 

Viewers also liked

Megastore
MegastoreMegastore
Megastore
robjk
 
CS295 Week5: Megastore - Providing Scalable, Highly Available Storage for Int...
CS295 Week5: Megastore - Providing Scalable, Highly Available Storage for Int...CS295 Week5: Megastore - Providing Scalable, Highly Available Storage for Int...
CS295 Week5: Megastore - Providing Scalable, Highly Available Storage for Int...
Varad Meru
 
Db presentation google_megastore
Db presentation google_megastoreDb presentation google_megastore
Db presentation google_megastore
Alanoud Alqoufi
 
Megastore: Providing scalable and highly available storage
Megastore: Providing scalable and highly available storageMegastore: Providing scalable and highly available storage
Megastore: Providing scalable and highly available storage
Niels Claeys
 
Megastore by Google
Megastore by GoogleMegastore by Google
Megastore by Google
Ankita Kapratwar
 
Learning from google megastore (Part-1)
Learning from google megastore (Part-1)Learning from google megastore (Part-1)
Learning from google megastore (Part-1)
Schubert Zhang
 
Google Megastore
Google MegastoreGoogle Megastore
Google Megastorebergwolf
 
Megastore - ID2220 Presentation
Megastore - ID2220 PresentationMegastore - ID2220 Presentation
Megastore - ID2220 Presentation
Arinto Murdopo
 

Viewers also liked (8)

Megastore
MegastoreMegastore
Megastore
 
CS295 Week5: Megastore - Providing Scalable, Highly Available Storage for Int...
CS295 Week5: Megastore - Providing Scalable, Highly Available Storage for Int...CS295 Week5: Megastore - Providing Scalable, Highly Available Storage for Int...
CS295 Week5: Megastore - Providing Scalable, Highly Available Storage for Int...
 
Db presentation google_megastore
Db presentation google_megastoreDb presentation google_megastore
Db presentation google_megastore
 
Megastore: Providing scalable and highly available storage
Megastore: Providing scalable and highly available storageMegastore: Providing scalable and highly available storage
Megastore: Providing scalable and highly available storage
 
Megastore by Google
Megastore by GoogleMegastore by Google
Megastore by Google
 
Learning from google megastore (Part-1)
Learning from google megastore (Part-1)Learning from google megastore (Part-1)
Learning from google megastore (Part-1)
 
Google Megastore
Google MegastoreGoogle Megastore
Google Megastore
 
Megastore - ID2220 Presentation
Megastore - ID2220 PresentationMegastore - ID2220 Presentation
Megastore - ID2220 Presentation
 

Similar to Noha mega store

Accra MongoDB User Group
Accra MongoDB User GroupAccra MongoDB User Group
Accra MongoDB User GroupMongoDB
 
App Grid Dev With Coherence
App Grid Dev With CoherenceApp Grid Dev With Coherence
App Grid Dev With Coherence
James Bayer
 
App Grid Dev With Coherence
App Grid Dev With CoherenceApp Grid Dev With Coherence
App Grid Dev With CoherenceJames Bayer
 
Application Grid Dev with Coherence
Application Grid Dev with CoherenceApplication Grid Dev with Coherence
Application Grid Dev with Coherence
James Bayer
 
Data has a better idea the in-memory data grid
Data has a better idea   the in-memory data gridData has a better idea   the in-memory data grid
Data has a better idea the in-memory data grid
Bogdan Dina
 
NoSQL Introduction, Theory, Implementations
NoSQL Introduction, Theory, ImplementationsNoSQL Introduction, Theory, Implementations
NoSQL Introduction, Theory, Implementations
Firat Atagun
 
CrawlerLD - Distributed crawler for linked data
CrawlerLD - Distributed crawler for linked dataCrawlerLD - Distributed crawler for linked data
CrawlerLD - Distributed crawler for linked data
Raphael do Vale
 
MongoDB: How We Did It – Reanimating Identity at AOL
MongoDB: How We Did It – Reanimating Identity at AOLMongoDB: How We Did It – Reanimating Identity at AOL
MongoDB: How We Did It – Reanimating Identity at AOL
MongoDB
 
Maximizing Data Lake ROI with Data Virtualization: A Technical Demonstration
Maximizing Data Lake ROI with Data Virtualization: A Technical DemonstrationMaximizing Data Lake ROI with Data Virtualization: A Technical Demonstration
Maximizing Data Lake ROI with Data Virtualization: A Technical Demonstration
Denodo
 
SpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud ComputingSpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud Computing
SpringPeople
 
Waters Grid & HPC Course
Waters Grid & HPC CourseWaters Grid & HPC Course
Waters Grid & HPC Course
jimliddle
 
http://www.hfadeel.com/Blog/?p=151
http://www.hfadeel.com/Blog/?p=151http://www.hfadeel.com/Blog/?p=151
http://www.hfadeel.com/Blog/?p=151xlight
 
MongoDB Sharding Webinar 2014
MongoDB Sharding Webinar 2014MongoDB Sharding Webinar 2014
MongoDB Sharding Webinar 2014
Dylan Tong
 
Estimating the Total Costs of Your Cloud Analytics Platform
Estimating the Total Costs of Your Cloud Analytics PlatformEstimating the Total Costs of Your Cloud Analytics Platform
Estimating the Total Costs of Your Cloud Analytics Platform
DATAVERSITY
 
Climbing the beanstalk
Climbing the beanstalkClimbing the beanstalk
Climbing the beanstalk
gordonyorke
 

Similar to Noha mega store (20)

Accra MongoDB User Group
Accra MongoDB User GroupAccra MongoDB User Group
Accra MongoDB User Group
 
App Grid Dev With Coherence
App Grid Dev With CoherenceApp Grid Dev With Coherence
App Grid Dev With Coherence
 
App Grid Dev With Coherence
App Grid Dev With CoherenceApp Grid Dev With Coherence
App Grid Dev With Coherence
 
Application Grid Dev with Coherence
Application Grid Dev with CoherenceApplication Grid Dev with Coherence
Application Grid Dev with Coherence
 
Data has a better idea the in-memory data grid
Data has a better idea   the in-memory data gridData has a better idea   the in-memory data grid
Data has a better idea the in-memory data grid
 
NoSQL Introduction, Theory, Implementations
NoSQL Introduction, Theory, ImplementationsNoSQL Introduction, Theory, Implementations
NoSQL Introduction, Theory, Implementations
 
CrawlerLD - Distributed crawler for linked data
CrawlerLD - Distributed crawler for linked dataCrawlerLD - Distributed crawler for linked data
CrawlerLD - Distributed crawler for linked data
 
NOSQL
NOSQLNOSQL
NOSQL
 
MongoDB: How We Did It – Reanimating Identity at AOL
MongoDB: How We Did It – Reanimating Identity at AOLMongoDB: How We Did It – Reanimating Identity at AOL
MongoDB: How We Did It – Reanimating Identity at AOL
 
Maximizing Data Lake ROI with Data Virtualization: A Technical Demonstration
Maximizing Data Lake ROI with Data Virtualization: A Technical DemonstrationMaximizing Data Lake ROI with Data Virtualization: A Technical Demonstration
Maximizing Data Lake ROI with Data Virtualization: A Technical Demonstration
 
SpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud ComputingSpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud Computing
 
IT6701-Information management question bank
IT6701-Information management question bankIT6701-Information management question bank
IT6701-Information management question bank
 
11g R2
11g R211g R2
11g R2
 
Waters Grid & HPC Course
Waters Grid & HPC CourseWaters Grid & HPC Course
Waters Grid & HPC Course
 
http://www.hfadeel.com/Blog/?p=151
http://www.hfadeel.com/Blog/?p=151http://www.hfadeel.com/Blog/?p=151
http://www.hfadeel.com/Blog/?p=151
 
MongoDB Sharding Webinar 2014
MongoDB Sharding Webinar 2014MongoDB Sharding Webinar 2014
MongoDB Sharding Webinar 2014
 
Estimating the Total Costs of Your Cloud Analytics Platform
Estimating the Total Costs of Your Cloud Analytics PlatformEstimating the Total Costs of Your Cloud Analytics Platform
Estimating the Total Costs of Your Cloud Analytics Platform
 
GemFire In-Memory Data Grid
GemFire In-Memory Data GridGemFire In-Memory Data Grid
GemFire In-Memory Data Grid
 
Facade
FacadeFacade
Facade
 
Climbing the beanstalk
Climbing the beanstalkClimbing the beanstalk
Climbing the beanstalk
 

More from Noha Elprince

My mapreduce1 presentation
My mapreduce1 presentationMy mapreduce1 presentation
My mapreduce1 presentationNoha Elprince
 
Noha danms13 talk_final
Noha danms13 talk_finalNoha danms13 talk_final
Noha danms13 talk_finalNoha Elprince
 

More from Noha Elprince (6)

Debug me
Debug meDebug me
Debug me
 
T2 fs talk
T2 fs talkT2 fs talk
T2 fs talk
 
Robot maptalk
Robot maptalkRobot maptalk
Robot maptalk
 
My mapreduce1 presentation
My mapreduce1 presentationMy mapreduce1 presentation
My mapreduce1 presentation
 
AdaptiveLab Talk1
AdaptiveLab Talk1AdaptiveLab Talk1
AdaptiveLab Talk1
 
Noha danms13 talk_final
Noha danms13 talk_finalNoha danms13 talk_final
Noha danms13 talk_final
 

Recently uploaded

FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
Peter Spielvogel
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
Assure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyesAssure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 

Recently uploaded (20)

FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
Assure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyesAssure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyes
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 

Noha mega store

  • 1. MegaStore Google Inc. Jason Baker, Chris Bond, James C Corbett, JJ Furman, Andrey Khorlin, James Larson, Jean-Michel Leon, Yawei Li, Alexander Lloyd, Vadim Yushprakh. CIDR 2011. Presented by: Noha Elprince 22 June, 2011
  • 2. What is MegaStore? §  A storage system developed to meet the requirements of today’s online interactive services. §  Megastore is the data engine supporting the Google App Engine (GAE) https://appengine.google.com/ §  GAE cloud computing technology: Hosts/virtualizes web apps across multiple servers on Google’s platform. Ø  Fast development and deployment. Ø  Simple administration. Ø  No need to worry about hardware patches or backups and scalability. Ø  2
  • 3. Outline —  Motivation & Problem —  Methodology —  Design of Megastore —  —  —  Data Model Data Storage Transactions and Concurrency Control —  How Megastore achieves Availability and Scalability. —  —  PAXOS. Megastore’s approach. —  Experience —  Related Work —  Conclusion 3
  • 4. Megastore- Motivation •  Storage requirements of today’s interactive online applications. —  —  —  —  —  Highly scalable Rapid development Low latency Durability and consistency Availability and fault tolerance. •  These requirements are in conflict ! 4
  • 5. CAP Theorem – Eric Brewer 2000 “In a distributed database system, you can only have at most two of the following three characteristics: Ø  Consistency Ø  Availability Ø  Partition tolerance ” ACID = Atomicity, Consistency, Isolation, Durability. 5
  • 6. Problem §  Conflicts between Available systems: —  RDBMS Rich set of features, expressive language helps development, but difficult to scale. Eg: MySQL, PostgreSQL, MS SQL Server, Oracle RDB. —  NoSQL datastores Highly Scalable but Limited API and loose consistency models. Eg: Google’s BigTable, Apache Hadoop’s Hbase, Facebook’s Cassandra. §  Reliability of a single datacenter cant be guaranteed 100%. [“Always expect the unexpected”—James Patterson] 6
  • 7. Methodology —  Megastore blends the scalability of NoSQL with the convenience of traditional RDBMS. —  High reliability can be achieved by: Ø  Data lives in multiple data centers. Ø  Write to a majority of datacenters synchronously. Ø  Allow the infrastructure decide what datacenter to read from and write to. 7
  • 8. Outline þ —  Motivation & Problem þ —  Methodology —  Design of Megastore —  —  —  Data Model Data Storage Transactions and Concurrency Control —  How Megastore achieves Availability and Scalability. —  —  PAXOS. Megastore’s approach. —  Experience —  Related Work —  Conclusion 8
  • 9. Design of Megastore : DataModel —  The data model is declared in a schema. —  Each schema has a set of tables : root tables or child tables. —  Entity Group – consists of a root entity along with all child entities. CREATE SCHEMA PhotoApp; CREATE TABLE User { required int64 user_id; required string name; } PRIMARY KEY(user_id), ENTITY GROUP ROOT; CREATE TABLE Photo { required int64 user_id; required int32 photo_id; required int64 time; required string full_url; optional string thumbnail_url; repeated string tag; } PRIMARY KEY(user_id, photo_id), IN TABLE User, ENTITY GROUP KEY(user_id) REFERENCES User; 9
  • 10. Design of Megastore : DataModel •  (Hierarchical) data is de-normalized to eliminate the join costs Joins are implemented in application level •  Outer joins with parallel queries using secondary indexed •  Provides an efficient stand-in for SQL-style joins 10
  • 11. Design of Megastore : Data Storage How is it stored in BigTable? “A Bigtable is a compressed, high performance, and proprietary database system built on : Google File System (GFS), Chubby Lock service and other Google programs ” 11
  • 12. Design of Megastore : Data Storage Example: User {user_id:101, name: ‘John’ } Photo{ user_id:101, photo_id:501, time 2009, full_url: ‘john-pic1’, Row Key 101 User.na Photo. me time User{user_id:102, name: ‘Mary’ } Photo{ user_id:102, photo_id:600, time:2009, full_url: ‘mary-pic1’, tag:’office’, tag:’picnic’, tag:’Paris’} Photo{ user_id:102, photo_id:601, time:2011, full_url: ‘mary-pic2’, tag:’birthday’, tag:’friends’} Photo URL John 101, 501 2009 Vacation, Hoilday, Paris … 101, 502 2010 Office, friends, pub … 102, 600 2009 Office, Picnic, Paris … 102, 601 2011 Birthday, Friends … tag:’vacation’, tag:’holiday’, tag:’Paris’} Photo{ user_id:101, photo_id:502, time:2010, full_url: ‘john-pic2’, tag:’office’, tag:’friends’, tag:’pub’} Photo. Tag 102 Mary 12
  • 13. Design of Megastore : Data Storage —  Indexing —  Local Index – find data within Entity Group. CREATE LOCAL INDEX PhotosByTime ON Photo(user_id, time); —  Global Index - spans entity groups. CREATE GLOBAL INDEX PhotosByTag ON Photo(tag) STORING (thumbnail_url); —  The ‘Storing’ Clause Ø  Faster retrieval of certain properties. 13
  • 14. Design of Megastore : Data Storage How is it stored in BigTable? PhotosByTime Row Key 101,2009, 101,501 101,2010, 101,502 102,2009, 102,600 102,2011, 102,601 PhotosByTag Row Key Thumbnail.Url Birthday,102, 601 … Friends, 101, 502 … Friends, 102,601 … Holiday, 101, 501 … Office, 101, 502 … Office, 102, 600 … Paris, 101, 501 … Paris, 102, 600 … Pub, 101, 502 … 14
  • 15. Outline þ —  Motivation & Problem þ —  Methodology þ —  Design of Megastore ✓ —  Data Model ✓ —  Data Storage —  Transactions and Concurrency Control —  How Megastore achieves Availability and Scalability. —  —  PAXOS. Megastore’s approach. —  Experience —  Related Work —  Conclusion 15
  • 16. Transactions and Concurrency Control •  Each Entity Group acts as mini-db, provides ACID semantics. •  Transaction management using Write Ahead Logging (WAL). •  BigTable feature – ability to store multiple data for same row/column with different timestamps. •  Cross entity group transactions supported via two-phase commit (2PC). •  Entites in an Entity group employs Multiversion Concurrency Control (MVCC).
  • 17. Transactions and Concurrency Control —  MVCC: multiversion concurrency control Using timestamps - reads and writes do not block each other. —  Read consistency —  Current: wait for uncommitted writes then read last committed value —  Snapshot: doesn't’t wait. Reads last committed values. —  Inconsistent reads: ignore the state of log and read the last values directly (data may be stale) —  Write consistency —  Determine the next available log position —  Assigns mutations of write-ahead log (WAL) a timestamp higher than any previous one —  Employs Paxos to settle the resource contention : Select a winner to write on a certain entity group. The others will abort/retry their operations. It uses optimistic concurrency OCC with mutations (write operations): (Assumes there is no transaction ‘s data conficts => proceed without locks )
  • 18. Transactions and Concurrency Control q  Queues §  Provide transactional messaging between entity groups. §  Each message either is : Ø  Synchronous: has a single sending and receiving entity group. Ø  Asynchronous: has different sending and receiving entity group. Fig. Operations across entity groups Ø  Useful to perform operations that affect many entity groups. 18
  • 19. Transactions and Concurrency Control q  Two-Phase Commit (2PC) §  Coordinator: the component that receives the commit/abort request §  Participants: the resource managers that did work on behalf of the transaction (by reading/updating resources). * Goal: Ensure that the coordinator and all participants either commit/abort the transaction => Atomicity is satisfied. Source: Ref[2] Disadv. High latency Adv. Simplify code for unique secondary key enforcement. 19
  • 20. Other Features —  Integrated Backup System Ø  used to restore back an entity group’s state to any point in time —  Data Encryption Ø  use distinct key/entity group 20
  • 21. Outline þ —  Motivation & Problem þ —  Methodology þ —  Design of Megastore ✓ —  Data Model ✓ —  Data Storage ✓ —  Transactions and Concurrency Control —  How Megastore achieves Availability and Scalability. —  —  PAXOS. Megastore’s approach. —  Experience —  Related Work —  Conclusion 21
  • 22. Megastore – Availability / Scalability v  Megastore Replication System •  Replication is done per entity group by: synchronously replicating the group’s transaction log into a number of replicas. •  Reads and writes can be initiated from any replicas. •  Writes require one round of interdatacenter communication. •  ACID semantics are preserved regardless of what replica a client starts from. Fig. Scalable Replication
  • 23. Megastore – Replication —  PAXOS Algorithm •  a way to reach consensus among a group of replicas on a single value. •  Databases typically use PAXOS to replicate a transaction log, where a separate instance of PAXOS is used for each position in the log. Source: Ref[3] Adv. Tolerates delayed or reordered messages and replicas that fail by Stopping (can tolerate upto N/2 failures). Disadv. high-latency bec. it demands multiple rounds of communication. so Megastore uses an improved version.
  • 24. Megastore – Replication •  Master-Based Approach Ø  A Master-Slave model is generally used where the Master handles all the replication of writes. Ø  But it causes a bottleneck.
  • 25. Megastore – Replication •  MegaStore Replication System (PAXOS-modified) §  Fast Reads - Allow local reads from any where. - Tracks a set of entity groups for which its replica has observed all PAXOS writes and serve their local reads. §  Fast Writes - A specific replica is chosen as a leader. - The leader decides the proposal no. and sends it to other writers. - The first writer submits a value to the leader, wins the right to ask all replicas to accept that value. •  Select the next write’s leader using the closest replica heuristic (aim: minimizes the writer-leader latency by observing: most apps submit writes from the same region repeatedly).
  • 26. Outline þ —  Motivation & Problem þ —  Methodology þ —  Design of Megastore ✓ —  Data Model ✓ —  Data Storage ✓ —  Transactions and Concurrency Control þ —  How Megastore achieves Availability and Scalability. —  —  PAXOS. Megastore’s approach. —  Experience —  Related Work —  Conclusion 26
  • 27. Experience ²  Real-world deployment —  More than 100 production application use Megastore (e.g. Google App Engine) —  Most of applications see extremely high availability —  Most of users see average write latencies of 100~400 ms.
  • 28. Related Work —  NoSQL data storage systems —  Bigtable, Cassandra, Yahoo PNUTS, Amazon SimpleDB —  Data replication process —  Hbase, CouchDB, Dynamo, … —  Extend replication scheme of traditional RDBMS systems —  Paxos algorithm —  SCALARIS, Keyspace, … —  Few have used Paxos to achieve synchronous replication
  • 29. Conclusion Megastore Ø  A scalable, highly available datastore for interactive internet services. Ø  Paxos is used for synchronous replication. Ø  Bigtable as the scalable datastore while adding richer primitives (ACID, Indexes). Ø  Has over 100 applications in productions 29
  • 31. References —  [1] “Megastore: Providing Scalable Highly Available Storage for Interactive Services.” Jason Baker et al.. CIDR 2011. —  [2] “Principles of transaction Processing.” Philip A. Bernstein, Eric Newcomer, Morgan Kaufmann, 2009. —  [3] http://paprika.umw.edu/~ernie/cpsc321/10312006.html —  [4] Google MegaStore’s Presentation at SIGMOD 2008. http://perspectives.mvdirona.com/2008/07/10/ GoogleMegastore.aspx. 31
  • 32. Megastore – Replication Megastore Read Process —  Each replica stores mutations and metadata for the log entries —  Read process —  1. Query Local —  —  2. —  —  —  3. —  Up-to-date check Find position Highest log position Select replica Catchup Check the consensus value from other replica —  4. Validate —  Synchronizing with up-to-data —  5. Query data —  Read data with timestamp
  • 33. Megastore – Replication —  Megastore Write Process —  —  Each replica stores mutations and metadata for the log entries Write process —  1. Accept leader —  Ask the leader to accept the value as proposal number —  2. Prepare —  Run the Paxos Prepare phase at all replica —  3. Accept —  Ask remaining replicas to accept the value —  4. Invalidate —  Fault handling for replicas which did not accept the value —  5. Apply —  Apply the value’s mutation at as many replicas as possible