SlideShare a Scribd company logo
Database Consistency Models
ACID
● Atomicity: each transaction is "all or nothing" (Commit or
rollback)
● Consistency: any transaction will bring the database from one
valid state to another (Preserves relational integrity)
● Isolation: concurrent execution of transactions results in a system
state that would be obtained if transactions were executed serially
● Durability: persistence to disk (rebooting doesn't cause data loss,
for example)
Examples
● Traditional relational databases:
● Oracle
● SQL Server
● MySQL
● Etc.
● Some NewSQL databases:
● VoltDB
● AltiBase
Deficiencies of ACID
● Difficult to maintain high availability & fault
tolerance in distributed scenarios
● CAP Theorem
● Huge performance overhead in distributed
synchronization
● Huge performance overhead to maintain integrity
CAP Theorem
(Brewer's conjecture)
CAP Theorem
(Brewer's conjecture)
● In plain english:
"...during a network partition, a distributed system
must choose either Consistency or Availability." --
foundationdb.com
CAP Theorem
(Brewer's conjecture)
● Assume that you want strong consistency.
● This implies synchronous, blocking updates.
● Assume you also want availability
● This implies multiple nodes with redundancies.
● When you update one node, you need broadcast
synchronously to all other nodes, waiting for
successful confirmations (very slow!!!)
● So far so good... But now a node failed to connect to
the others (network failure)!
● If you don't wait for it to come back, you've
sacrificed consistency. If you block on it, you've
sacrificed availability.
CAP Theorem
(Brewer's conjecture)
BASE
● Basically available: there will be a response to any request, but
that response could still be ‘failure’ to obtain the requested data or
the data may be in an inconsistent or changing state.
● Soft state: even during times without input there may be changes
going on due to ‘eventual consistency,’ thus the state of the
system is always ‘soft.’
● Eventually consistent: "the storage system guarantees that if no
new updates are made to the object, eventually all accesses will
return the last updated value." -- the CTO of Amazon.com
Safety versus Liveness
● Liveness: a value distributed across systems eventually converges
to be the same across those same systems (generally the last
update value).
● "Something good eventually happens"
● Safety:the system is at all times consistent.
● "Nothing bad ever happens"
● Eventual consistency is purely a liveness guarantee (reads
eventually return the same value) and does not make safety
guarantees: an eventually consistent system can return any value
before it converges.
Safety versus Liveness
● To be clear: in eventual consistency, by default, two
concurrent read/write increments of a standard
counter can potentially increase it by only 1.
● The last write wins, but there is no guarantee with
regards to what happened in between (and they may
have both read the value when it wasn't consistent)
● This is what happens when you don't have any safety
guarantee, as in eventual consistency.
Examples
● Most big social media websites
● Google Cloud Datastore
● Most NoSQL databases:
● Riak, Redis, Hadoop (without Hbase), Couchbase,
MongoDB (in some configurations), Cassandra (in some
configurations)
● Etc.
● Amazon's Dynamo DB
● DNS (Domain Name System)
Deficiencies of BASE
● Delay in convergence
● No safety guarantee
● You don't have the same update semantics as in ACID
transactions
Solutions to BASE's Problems
● Application developers can write compensation logic
● Okay in small, simple applications
● Quickly becomes umanageable in complex applications
● ACID 2.0 design principles that guarantee ACID-like
consistency even with an eventual consistency
mechanism.
Mutable shared states are the root of all evil.
ACID 2.0
● Associativity & Commutativity: the messages in the queue can
be processed in any order.
● Idempotence: the message queue can use at-least-once-delivery
guarantees (retry logic). Duplicate processing of the same
message doesn't matter.
● Distributed: refers to the fact that ACID 2.0 applies to distributed
systems.
What does it mean?
● Unlike ACID and BASE, ACID 2.0 doesn't tell you
what are the guarantees, instead it tells you that there
are certain design principles that are immune to
transactional integrity issues.
● In particular, immutable data structures that you
transform are easier to handle than mutable shared
states (as most functional programming languages
have understood)
The CALM Theorem
● Consistency as Logical Monotonicity
● Logically monotonic: intuitively, a monotonic program
(or data structure) makes forward progress over time: it
never "retracts" an earlier conclusion in the face of new
information.
● Implementation is usually through a class of data
structures referred to as CRDTs (conflict-free
replicated data types)
Example: the PN-Counter
● Counts the number of increment and decrement calls
per transaction (or "actor", or "node")
● When the value is read, it's calculated on the fly by
summing up the number of increment "marks" and
subtracting from the number of decrement "marks"
Example: the PN-Counter
Example: Bitcoin
● The bitcoin transaction ledger is a CRDT. It's an
append only structure.
● The ledger contains the history of all transactions
ever made: and it's a replicated dataset, updated by
appending new transactions in a peer-to-peer
"eventual consistency" framework.
Example: Apache Spark RDDs
● Spark is a high-performance distributed computing
framework
● Big Data analytics
● Machine learning (MLlib)
● Distributed graph processing (GraphX)
● Spark SQL
● It replaces Hadoop MapReduce (about 30 to 100 times
faster)
● The essence of the Spark framework is a type of data
structure called a Resilient Distributed Dataset
(which is a CRDT).
Example: Apache Spark RDDs
● RDDs features:
● Immutable
● Distributed / Replicated
● Expose map(), filter(), reduce(), join() operations to
produce new derived RDDs (very "functional"
rather than object-oriented – written in Scala)
● Logs "lineage" information (how the RDD was
constructed) across partitions, rather than the data
itself, for efficiency. If a network fault occurs, it
can reconstruct the data through that lineage. This
way the cost of data replication isn't generally
incurred (only in fault recovery scenarios).
Example: Apache Spark RDDs
Other examples
● Apache Kafka message queue
● Riak vector clocks for synchronization
● The game league of legends uses Riak CRDTs for its in-
game chat system
● TreeDoc and Logoot: for collaborative text editing
● SoundCloud uses a CRDT set for streaming,
implemented on top of Redis
Deficiences of CRDTs
● Not a universal solution: doesn't cover all possible
applications
● Garbage collection issues (append-only means it
consumes increasing amounts of space!)
● Complex to design
Some solutions
● Bloom programming language
● Provide a "framework" to develop in a commutative,
order-insensitive way that favors data structure of a
CRDT type.
● Existing distributed computing platforms do the
complicated work for us (Apache Spark, for example)
● We still need to accept locking ACID or weakly
consistent BASE for some parts of the system. We
can also resort to better "compromises" such as
causal consistency.

More Related Content

What's hot

20. Parallel Databases in DBMS
20. Parallel Databases in DBMS20. Parallel Databases in DBMS
20. Parallel Databases in DBMS
koolkampus
 

What's hot (20)

Transaction management DBMS
Transaction  management DBMSTransaction  management DBMS
Transaction management DBMS
 
2. Distributed Systems Hardware & Software concepts
2. Distributed Systems Hardware & Software concepts2. Distributed Systems Hardware & Software concepts
2. Distributed Systems Hardware & Software concepts
 
CAP Theorem
CAP TheoremCAP Theorem
CAP Theorem
 
Distributed transactions
Distributed transactionsDistributed transactions
Distributed transactions
 
Distributed datababase Transaction and concurrency control
Distributed datababase Transaction and concurrency controlDistributed datababase Transaction and concurrency control
Distributed datababase Transaction and concurrency control
 
Lock based protocols
Lock based protocolsLock based protocols
Lock based protocols
 
2 PHASE COMMIT PROTOCOL
2 PHASE COMMIT PROTOCOL2 PHASE COMMIT PROTOCOL
2 PHASE COMMIT PROTOCOL
 
Query Decomposition and data localization
Query Decomposition and data localization Query Decomposition and data localization
Query Decomposition and data localization
 
Database recovery techniques
Database recovery techniquesDatabase recovery techniques
Database recovery techniques
 
Distributed Database Management System
Distributed Database Management SystemDistributed Database Management System
Distributed Database Management System
 
Ddbms1
Ddbms1Ddbms1
Ddbms1
 
Concurrency Control in Database Management System
Concurrency Control in Database Management SystemConcurrency Control in Database Management System
Concurrency Control in Database Management System
 
ACID Property in DBMS
ACID Property in DBMSACID Property in DBMS
ACID Property in DBMS
 
Distributed shared memory shyam soni
Distributed shared memory shyam soniDistributed shared memory shyam soni
Distributed shared memory shyam soni
 
20. Parallel Databases in DBMS
20. Parallel Databases in DBMS20. Parallel Databases in DBMS
20. Parallel Databases in DBMS
 
Transactions in dbms
Transactions in dbmsTransactions in dbms
Transactions in dbms
 
NOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQLNOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQL
 
Acid properties
Acid propertiesAcid properties
Acid properties
 
CS9222 ADVANCED OPERATING SYSTEMS
CS9222 ADVANCED OPERATING SYSTEMSCS9222 ADVANCED OPERATING SYSTEMS
CS9222 ADVANCED OPERATING SYSTEMS
 
Introduction to distributed database
Introduction to distributed databaseIntroduction to distributed database
Introduction to distributed database
 

Similar to Database Consistency Models

HbaseHivePigbyRohitDubey
HbaseHivePigbyRohitDubeyHbaseHivePigbyRohitDubey
HbaseHivePigbyRohitDubey
Rohit Dubey
 
Design Patterns For Distributed NO-reational databases
Design Patterns For Distributed NO-reational databasesDesign Patterns For Distributed NO-reational databases
Design Patterns For Distributed NO-reational databases
lovingprince58
 
Lecture-04-Principles of data management.pdf
Lecture-04-Principles of data management.pdfLecture-04-Principles of data management.pdf
Lecture-04-Principles of data management.pdf
manimozhi98
 
Distributed Algorithms
Distributed AlgorithmsDistributed Algorithms
Distributed Algorithms
913245857
 

Similar to Database Consistency Models (20)

HbaseHivePigbyRohitDubey
HbaseHivePigbyRohitDubeyHbaseHivePigbyRohitDubey
HbaseHivePigbyRohitDubey
 
Software architecture for data applications
Software architecture for data applicationsSoftware architecture for data applications
Software architecture for data applications
 
Design Patterns for Distributed Non-Relational Databases
Design Patterns for Distributed Non-Relational DatabasesDesign Patterns for Distributed Non-Relational Databases
Design Patterns for Distributed Non-Relational Databases
 
Distributed systems and scalability rules
Distributed systems and scalability rulesDistributed systems and scalability rules
Distributed systems and scalability rules
 
Cassandra
CassandraCassandra
Cassandra
 
BigData as a Platform: Cassandra and Current Trends
BigData as a Platform: Cassandra and Current TrendsBigData as a Platform: Cassandra and Current Trends
BigData as a Platform: Cassandra and Current Trends
 
Design Patterns For Distributed NO-reational databases
Design Patterns For Distributed NO-reational databasesDesign Patterns For Distributed NO-reational databases
Design Patterns For Distributed NO-reational databases
 
Hbase hive pig
Hbase hive pigHbase hive pig
Hbase hive pig
 
Azure and cloud design patterns
Azure and cloud design patternsAzure and cloud design patterns
Azure and cloud design patterns
 
No sql (not only sql)
No sql                 (not only sql)No sql                 (not only sql)
No sql (not only sql)
 
Highly available distributed databases, how they work, javier ramirez at teowaki
Highly available distributed databases, how they work, javier ramirez at teowakiHighly available distributed databases, how they work, javier ramirez at teowaki
Highly available distributed databases, how they work, javier ramirez at teowaki
 
NoSQL Database
NoSQL DatabaseNoSQL Database
NoSQL Database
 
CAP: Scaling, HA
CAP: Scaling, HACAP: Scaling, HA
CAP: Scaling, HA
 
CrateDB - Giacomo Ceribelli
CrateDB - Giacomo CeribelliCrateDB - Giacomo Ceribelli
CrateDB - Giacomo Ceribelli
 
Lecture-04-Principles of data management.pdf
Lecture-04-Principles of data management.pdfLecture-04-Principles of data management.pdf
Lecture-04-Principles of data management.pdf
 
Distributed Algorithms
Distributed AlgorithmsDistributed Algorithms
Distributed Algorithms
 
NoSQL Evolution
NoSQL EvolutionNoSQL Evolution
NoSQL Evolution
 
Introduction to mongodb
Introduction to mongodbIntroduction to mongodb
Introduction to mongodb
 
Modern databases and its challenges (SQL ,NoSQL, NewSQL)
Modern databases and its challenges (SQL ,NoSQL, NewSQL)Modern databases and its challenges (SQL ,NoSQL, NewSQL)
Modern databases and its challenges (SQL ,NoSQL, NewSQL)
 
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
 

Recently uploaded

JustNaik Solution Deck (stage bus sector)
JustNaik Solution Deck (stage bus sector)JustNaik Solution Deck (stage bus sector)
JustNaik Solution Deck (stage bus sector)
Max Lee
 

Recently uploaded (20)

Accelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with PlatformlessAccelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with Platformless
 
Using IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New ZealandUsing IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New Zealand
 
Studiovity film pre-production and screenwriting software
Studiovity film pre-production and screenwriting softwareStudiovity film pre-production and screenwriting software
Studiovity film pre-production and screenwriting software
 
Crafting the Perfect Measurement Sheet with PLM Integration
Crafting the Perfect Measurement Sheet with PLM IntegrationCrafting the Perfect Measurement Sheet with PLM Integration
Crafting the Perfect Measurement Sheet with PLM Integration
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
 
AI/ML Infra Meetup | Perspective on Deep Learning Framework
AI/ML Infra Meetup | Perspective on Deep Learning FrameworkAI/ML Infra Meetup | Perspective on Deep Learning Framework
AI/ML Infra Meetup | Perspective on Deep Learning Framework
 
iGaming Platform & Lottery Solutions by Skilrock
iGaming Platform & Lottery Solutions by SkilrockiGaming Platform & Lottery Solutions by Skilrock
iGaming Platform & Lottery Solutions by Skilrock
 
Designing for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web ServicesDesigning for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web Services
 
JustNaik Solution Deck (stage bus sector)
JustNaik Solution Deck (stage bus sector)JustNaik Solution Deck (stage bus sector)
JustNaik Solution Deck (stage bus sector)
 
KLARNA - Language Models and Knowledge Graphs: A Systems Approach
KLARNA -  Language Models and Knowledge Graphs: A Systems ApproachKLARNA -  Language Models and Knowledge Graphs: A Systems Approach
KLARNA - Language Models and Knowledge Graphs: A Systems Approach
 
10 Essential Software Testing Tools You Need to Know About.pdf
10 Essential Software Testing Tools You Need to Know About.pdf10 Essential Software Testing Tools You Need to Know About.pdf
10 Essential Software Testing Tools You Need to Know About.pdf
 
AI/ML Infra Meetup | ML explainability in Michelangelo
AI/ML Infra Meetup | ML explainability in MichelangeloAI/ML Infra Meetup | ML explainability in Michelangelo
AI/ML Infra Meetup | ML explainability in Michelangelo
 
Implementing KPIs and Right Metrics for Agile Delivery Teams.pdf
Implementing KPIs and Right Metrics for Agile Delivery Teams.pdfImplementing KPIs and Right Metrics for Agile Delivery Teams.pdf
Implementing KPIs and Right Metrics for Agile Delivery Teams.pdf
 
StrimziCon 2024 - Transition to Apache Kafka on Kubernetes with Strimzi
StrimziCon 2024 - Transition to Apache Kafka on Kubernetes with StrimziStrimziCon 2024 - Transition to Apache Kafka on Kubernetes with Strimzi
StrimziCon 2024 - Transition to Apache Kafka on Kubernetes with Strimzi
 
Secure Software Ecosystem Teqnation 2024
Secure Software Ecosystem Teqnation 2024Secure Software Ecosystem Teqnation 2024
Secure Software Ecosystem Teqnation 2024
 
GraphAware - Transforming policing with graph-based intelligence analysis
GraphAware - Transforming policing with graph-based intelligence analysisGraphAware - Transforming policing with graph-based intelligence analysis
GraphAware - Transforming policing with graph-based intelligence analysis
 
A Guideline to Zendesk to Re:amaze Data Migration
A Guideline to Zendesk to Re:amaze Data MigrationA Guideline to Zendesk to Re:amaze Data Migration
A Guideline to Zendesk to Re:amaze Data Migration
 
De mooiste recreatieve routes ontdekken met RouteYou en FME
De mooiste recreatieve routes ontdekken met RouteYou en FMEDe mooiste recreatieve routes ontdekken met RouteYou en FME
De mooiste recreatieve routes ontdekken met RouteYou en FME
 
Agnieszka Andrzejewska - BIM School Course in Kraków
Agnieszka Andrzejewska - BIM School Course in KrakówAgnieszka Andrzejewska - BIM School Course in Kraków
Agnieszka Andrzejewska - BIM School Course in Kraków
 

Database Consistency Models

  • 2. ACID ● Atomicity: each transaction is "all or nothing" (Commit or rollback) ● Consistency: any transaction will bring the database from one valid state to another (Preserves relational integrity) ● Isolation: concurrent execution of transactions results in a system state that would be obtained if transactions were executed serially ● Durability: persistence to disk (rebooting doesn't cause data loss, for example)
  • 3. Examples ● Traditional relational databases: ● Oracle ● SQL Server ● MySQL ● Etc. ● Some NewSQL databases: ● VoltDB ● AltiBase
  • 4. Deficiencies of ACID ● Difficult to maintain high availability & fault tolerance in distributed scenarios ● CAP Theorem ● Huge performance overhead in distributed synchronization ● Huge performance overhead to maintain integrity
  • 6. CAP Theorem (Brewer's conjecture) ● In plain english: "...during a network partition, a distributed system must choose either Consistency or Availability." -- foundationdb.com
  • 7. CAP Theorem (Brewer's conjecture) ● Assume that you want strong consistency. ● This implies synchronous, blocking updates. ● Assume you also want availability ● This implies multiple nodes with redundancies. ● When you update one node, you need broadcast synchronously to all other nodes, waiting for successful confirmations (very slow!!!) ● So far so good... But now a node failed to connect to the others (network failure)! ● If you don't wait for it to come back, you've sacrificed consistency. If you block on it, you've sacrificed availability.
  • 9. BASE ● Basically available: there will be a response to any request, but that response could still be ‘failure’ to obtain the requested data or the data may be in an inconsistent or changing state. ● Soft state: even during times without input there may be changes going on due to ‘eventual consistency,’ thus the state of the system is always ‘soft.’ ● Eventually consistent: "the storage system guarantees that if no new updates are made to the object, eventually all accesses will return the last updated value." -- the CTO of Amazon.com
  • 10. Safety versus Liveness ● Liveness: a value distributed across systems eventually converges to be the same across those same systems (generally the last update value). ● "Something good eventually happens" ● Safety:the system is at all times consistent. ● "Nothing bad ever happens" ● Eventual consistency is purely a liveness guarantee (reads eventually return the same value) and does not make safety guarantees: an eventually consistent system can return any value before it converges.
  • 11. Safety versus Liveness ● To be clear: in eventual consistency, by default, two concurrent read/write increments of a standard counter can potentially increase it by only 1. ● The last write wins, but there is no guarantee with regards to what happened in between (and they may have both read the value when it wasn't consistent) ● This is what happens when you don't have any safety guarantee, as in eventual consistency.
  • 12. Examples ● Most big social media websites ● Google Cloud Datastore ● Most NoSQL databases: ● Riak, Redis, Hadoop (without Hbase), Couchbase, MongoDB (in some configurations), Cassandra (in some configurations) ● Etc. ● Amazon's Dynamo DB ● DNS (Domain Name System)
  • 13. Deficiencies of BASE ● Delay in convergence ● No safety guarantee ● You don't have the same update semantics as in ACID transactions
  • 14. Solutions to BASE's Problems ● Application developers can write compensation logic ● Okay in small, simple applications ● Quickly becomes umanageable in complex applications ● ACID 2.0 design principles that guarantee ACID-like consistency even with an eventual consistency mechanism.
  • 15. Mutable shared states are the root of all evil.
  • 16. ACID 2.0 ● Associativity & Commutativity: the messages in the queue can be processed in any order. ● Idempotence: the message queue can use at-least-once-delivery guarantees (retry logic). Duplicate processing of the same message doesn't matter. ● Distributed: refers to the fact that ACID 2.0 applies to distributed systems.
  • 17. What does it mean? ● Unlike ACID and BASE, ACID 2.0 doesn't tell you what are the guarantees, instead it tells you that there are certain design principles that are immune to transactional integrity issues. ● In particular, immutable data structures that you transform are easier to handle than mutable shared states (as most functional programming languages have understood)
  • 18. The CALM Theorem ● Consistency as Logical Monotonicity ● Logically monotonic: intuitively, a monotonic program (or data structure) makes forward progress over time: it never "retracts" an earlier conclusion in the face of new information. ● Implementation is usually through a class of data structures referred to as CRDTs (conflict-free replicated data types)
  • 19. Example: the PN-Counter ● Counts the number of increment and decrement calls per transaction (or "actor", or "node") ● When the value is read, it's calculated on the fly by summing up the number of increment "marks" and subtracting from the number of decrement "marks"
  • 21. Example: Bitcoin ● The bitcoin transaction ledger is a CRDT. It's an append only structure. ● The ledger contains the history of all transactions ever made: and it's a replicated dataset, updated by appending new transactions in a peer-to-peer "eventual consistency" framework.
  • 22. Example: Apache Spark RDDs ● Spark is a high-performance distributed computing framework ● Big Data analytics ● Machine learning (MLlib) ● Distributed graph processing (GraphX) ● Spark SQL ● It replaces Hadoop MapReduce (about 30 to 100 times faster) ● The essence of the Spark framework is a type of data structure called a Resilient Distributed Dataset (which is a CRDT).
  • 23. Example: Apache Spark RDDs ● RDDs features: ● Immutable ● Distributed / Replicated ● Expose map(), filter(), reduce(), join() operations to produce new derived RDDs (very "functional" rather than object-oriented – written in Scala) ● Logs "lineage" information (how the RDD was constructed) across partitions, rather than the data itself, for efficiency. If a network fault occurs, it can reconstruct the data through that lineage. This way the cost of data replication isn't generally incurred (only in fault recovery scenarios).
  • 25. Other examples ● Apache Kafka message queue ● Riak vector clocks for synchronization ● The game league of legends uses Riak CRDTs for its in- game chat system ● TreeDoc and Logoot: for collaborative text editing ● SoundCloud uses a CRDT set for streaming, implemented on top of Redis
  • 26. Deficiences of CRDTs ● Not a universal solution: doesn't cover all possible applications ● Garbage collection issues (append-only means it consumes increasing amounts of space!) ● Complex to design
  • 27. Some solutions ● Bloom programming language ● Provide a "framework" to develop in a commutative, order-insensitive way that favors data structure of a CRDT type. ● Existing distributed computing platforms do the complicated work for us (Apache Spark, for example) ● We still need to accept locking ACID or weakly consistent BASE for some parts of the system. We can also resort to better "compromises" such as causal consistency.