SlideShare a Scribd company logo
Data Patterns in
Microservice Applications
Ryan Knight - CEO / CTO Grand Cloud
@knight_cloud
Ryan Knight
● CEO / CTO of Grand Cloud - Boutique consulting company working
at the intersection of Distributed Systems and Data Engineering
● Experience ranges across traditional software development and
architecture to sales engineering, consulting, solution architecture
and developer advocacy.
● Worked across wide range of companies from small startups such as
Lightbend and DataStax to Large Corporations such as Starbucks
and Capital One.
● Consulting Experience spans over 50 companies and 10 Countries
● Currently Consulting at Brighthouse Financial
Distributed System Design
Heart of distributed system design is a requirement
for a consistent, performant, and reliable way of
managing data - Jonas Bonér
Cloud Native -> New Requirements
Users: 1 million+ Data volume: TB–PB–EB
Locality: Global
Performance: Milliseconds–microseconds
Request rate: Millions
Access: Web, mobile, IoT, devices
Scale: Up-down, Out-in
Economics: Pay for what you use
Developer access: No assembly required
Challenges with Data
Consistency
● RDBMS
● CAP Theorem
● Trade-off between
Consistency and Scale
● Rise of Eventual Consistency
● NoSQL Databases
EASY
COMPLEX
ACID TXN / Strong
Consistency
Eventual Consistency
(D)evolution of Consistency
Challenges
with Eventual
Consistency
Credit to this tweet
CAP Theorem
Challenges with Application Tier Consistency
● Consistency problems are far harder to solve in the application tier
● Increased Corner Case Bugs
○ Consistency is really hard to get right in the Application Tier!
○ Consistency is really hard to test and verify
● Increased Complexity
Business Impact of Consistency
● Travel Booking of Flight, Hotel, etc. - Inconsistencies could either
lead to double bookings or lost bookings.
● Rewards Program - Very difficult to prevent fraudulent redemptions.
Potential for monetary loss.
● Physical Allocation of Resources vs. Digital Realm
● Inventory / Limited Sales
Direct Business Value of “Strong Consistency”
● Increases accuracy of sales and reduces lost business revenue
● Cost Savings with reduced operational complexity and increased
visibility into business operations.
● Weak Consistency is a Security Concern - Possible financial loss
from inconsistent views of data.
● ACIDRain Attack - Todd Warszawski, Peter Bailis
○ 22 critical ACIDRain attacks that allow attackers to corrupt store
inventory,over-spend gift cards, and steal inventory.
○ Bankrupt popular Bitcoin exchange
Eventual Consistency
● Internet of Things
● Media
● Retail
● Real-time Analytics
● Time-Series
● Monitoring
● Customer 360
Strong Consistency
● Financial Transactions
● Rewards Programs
● Inventory Management
● Global Meta-Data
● Travel Reservations
● Gaming
● Billing / Payments
● Ad Tech
What is Data Consistency?
Challenges with Understanding Consistency
● Lots of Definitions of Consistency
● Consistency in ACID is about enforcing invariants
○ Data must be valid according to all defined rules
○ Not the consistency we are looking for
● "Strong consistency" - term used to differentiate full
consistency from weaker levels of consistency such as
casual or session consistency.
Consistency Challenges
Dirty Reads - Read Uncommitted Write
Read Skew / Non-Repeatable Reads
Read your own Writes
Lost Updates
Write Skew
Write Skew
Two concurrent
transactions each
determine what they are
writing based on reading
a data set which overlaps
what the other is writing
begriffs.com
Consistency Models
Credit to Peter Bailis and Aphyr at jepsen.io
http://www.bailis.org/blog/linearizability-versus-serializability/
Linearizability
● Guarantees that the order of reads and writes to a single
register or row will always appear the same on all nodes.
● Appearance that there is only one copy of the data.
● It doesn’t group operations into transactions.
● Guarantees read-your-write behavior.
Linearizable Consistency in CAP
● CAP Theorem is about “atomic consistency”
● Atomic consistency refers only to a property of a single
request/response operation sequence.
● Strong Consistency in CAP is Linearizability
Serializable Consistency
● Transaction Isolation
● Database guarantees that two transactions have the same
effect as if they where run serially.
● multi-operation, multi-object, arbitrary total order
Strict Serializability
● Linearizability plus Serializability provides Strict
Serializability
● Highest level of Consistency
● Guarantee ordering and transaction isolation
Linearizable vs. Serializable Consistency
● Serializability - multi-operation, multi-object, arbitrary total order
● Linearizability - single-operation, single-object, real-time order
● Strict Serializability - Linearizability plus Serializability provides Strict
Serializability
Peter Bailis - Linearizability versus Serializability
No One Solution to Consistency
● Do you want your data right or right now? - Pat Helland
● PACELC Theorem -> More than CAP
○ In the absence of network partitions the trade-off is
between latency and consistency - Daniel Abadi
● Evaluate trade-offs in the differing approaches
Data Consistency in
Microservices and
Serverless
From Monolith to Microservices to Serverless
● Data Consistency was easy in a monolith application -
single source of truth w/ ACID transactions
● Move to microservices each service became a bounded
context that owns and manages its data.
● Data Consistency became very difficult w/ microservices
● Serverless increases the complexity even more
Consistency Challenges with Data in Microservices
● Traditional ACID transactions did not scale
● Data orchestration between multiple services - Number of
Microservices Increases Number of Interactions
● Stateful or Stateless
● Data rehydration for things like service failures and rolling
updates.
Popularity of Eventual Consistency
CAP Theorem
• Force choice between Global Scale or Strong Consistency
Eventual Consistency
• Sacrificed consistency for availability and partition tolerance.
• Really a Necessary Evil
• Write now and figure it out later
Pushed complexity of managing consistency to application tier
Rise of Managing
Consistency
in the Database
Value of Consistency in the Database
● Decrease Application Tier Complexity
● Reduce Cognitive Overhead
● Increased Developer Productivity
● Increased Focus on Business Value
● Most implementations also provide strong atomicity and isolation
● Push complexity of consistency back to the database
● Not a panacea for all data consistency challenges
Case Study - AdStage
● Recently migrated from Cassandra to Postgres
● Leverage Postgres DB Transactions
● Found Postgres to be extremely capable with advance
data model and query capabilities
● Significant decrease in application and operational
complexity
● Significantly reduced operational costs
Leveraging DB Consistency
● Ledger Pattern with Compare and Swap Like Operation
● Application reads latest ledger id from DB
● Application makes an update with what it thinks is the latest
ledger id plus one
● DB transaction / stored procedure to read the last ledger id and
make the update if the ledger id is greater than the last entry
● If update fails DB returns correct Ledger ID
Traditional / Hybrid NoSQL DB’s
● Cloud Operated Relation DB’s are a re-emerging trend.
● Cloud SQL w/ Postgres or MySQL
● AWS Aurora - Amazon re-designed MySQL as a
cloud-native relational database
● AWS Dynamo w/ Transactions - Multiple Object with limits
to single region
Next Generation Databases
● Google Spanner - Horizontally scalable, globally consistent, relational database
service. Relies on on Proprietary Atomic Clocks and Low Latency Network.
● Coackroach & YugaByte - Open Source version of Spanner with 2 Phase
Commits and Hybrid-Logical Clocks
● Fauna - Single Phase Commit with no hard dependency on clocks
● FoundationDB - Serializable Optimistic MVCC concurrency. Loosely based on
Google Percolator
● TiDB - Hybrid Transactional and Analytical Processing (HTAP) workloads.
Features “horizontal scalability, strong consistency, and high availability.”
● Microsoft Azure Cosmos DB - Configurable consistency guarantees
Transactions are hard. Distributed transactions are
harder. Distributed transactions over the WAN are
final boss hardness. I'm all for new DBMSs but
people should tread carefully. - Andy Pavlo
New Generation / Global Transactional Databases
Not All Global Databases are the Same
● Differences in Transaction Protocol
● Global Ordering Done in a Single Phase vs. Multi-Phase
● Pre or Post Commit Transaction Resolution
● Different levels of consistency
● Maximum scope of a transaction - Single Record vs. Multiple
Records
● Geographic limits of transactions - Single Region vs. Global?
● Storage Layer is an entirely other discussion beyond the
transaction protocol. Large impact on performance and stability!
Week Isolation Level
Scope of Transaction -
Single Row
Eventually Consistent
Strongest Isolation Level
Scope of Transaction -
Distributed Across
Partitions
Serializable Consistency
Consistency and the ACID Spectrum
Consistency Levels in Next Gen Databases - 1/2
● Google Spanner - External strong consistency across rows, regions, and
continents.
● Yugabyte - snapshot isolation, not serializability yet, writes must go to
partition leaders. Reliance on hybrid clocks makes it difficult to run in
virtualized environments.
● Cockroach - serializability but not strict serializability, reads and writes must
go to partition leaders, no replica reads allowed
Consistency Levels in Next Gen Databases - 2/2
● TiDB - read-committed within a datacenter, no serializability, timestamp oracle
must issue leases for all write transactions, replica reads unclear
● FoundationDb: Serializable Snapshot Isolation and strictly serializable within a
datacenter, timestamp oracle must issue leases for all serializable reads and
all writes, snapshot reads possible
● FaunaDB - Global pre-ordering of transactions provides strict serializable consistency
● Azure Cosmos DB - Five consistency models allow developer to choose between
latency and consistency. Highest Level of consistency is strong consistency with
linearizability guarantees. Doesn’t seem to be strict serializable?
Adventures in Application
Tier Consistency
Application Tier Consistency
Write now and figure it out later
Advantages of Application Tier Consistency
● Low Read / Write Latency
● High-Throughput
● Read your Writes - Same session only
● Requires application to enforce session stickiness
Disadvantages of Application Tier Consistency
● Consistency problems are far harder to solve in the
application tier
● Increased Complexity
● No Isolation and limited atomicity
● Corner Case Bugs - Consistency is really hard to test and
verify
● No magic pattern or technology that you can sprinkle on
data to make it consistent.
Options for Application Tier Consistency
● Serialization Points - i.e. Kafka Consumers pinned to session id’s.
● Akka Clustering - Stateful Services pinned to a client id.
● CRDT - Conflict Free Replicated Data Types, i.e. Associative
Counters. Data must be of a certain shape to work.
● Event Sourcing / Append Only Logging with Aggregates for running
totals. Hard to provide consistency guarantees across aggregates.
● Saga Pattern - Builds on Event Sourcing and uses a Central
Coordinator to manages complex transaction logic. Relies heavily on
idempotent services that can roll back transactions in the face of
failures.
Patterns for Application Tier Consistency
● Kafka Consumer Serialization Points
● Akka Clustering w/ Cluster Singletons
● CRDT - Conflict Free Replicated Data Types
● Event Sourcing / Append Only Logging
with Aggregates
● CQRS
● Saga Pattern
● Custom Distributed Transactions
WIRED
TIRED
CRDT’s
● CRDT - Conflict Free Replicated Data Types
● Data types that guarantee convergence to the same value without any
synchronization mechanism
● Consistency without Consensus
● Avoid distributed locks, two-phase commit, etc. Data Structure that
tells how to build the value
● Sacrifice linearizability (guaranteed ordering ) while remaining correct
Overview of Saga Pattern
● Central Coordinator
● Manages Complex Transaction Logic
● State managed in an distributed log
● Split work into idempotent executors / requests
● Requires compensating transactions for dealing with failures /
aborting transaction
● Effectively Once instead of Exactly Once
The Challenges with the Saga Pattern
● Consistency is reliant on the consistency of the distributed log
● Limited Consistency
● Weak Isolation
● No Guaranteed Atomicity - Unsafe partially committed states
● Complexity with versioning of Saga Logic
● Increased application complexity
● Rollback and recovery logic required in application tier
● Idempotency impossible for some services
● Effectively Once instead of Exactly Once
Data Patterns in
Microservice Applications
Ryan Knight - CEO / CTO Grand Cloud
@knight_cloud
Addendum
Global Scale Next Gen
Databases
Spanner
● External consistency, an isolation level even stricter than strict serializability
● Relation Integrity Constraints
● 99.999% availability SLA
● Uses a global commit timestamps to guarantee ordering of transactions via the
TrueTime API.
● Multiple Shards with 2PC
● Single Shard Avoids 2PC for Writes / Read-only Transactions also avoid 2 PC
● No Downtime upgrades - Maintenance done by moving data between nodes
● Downside is cost and some limitations to the SQL model and schema design
CoackroachDB
● Open source Database Inspired by Spanner
● Hybrid Logical Clock similar to a vector clock for ordering of transactions
● Challenges with clock skew - waits up to 250 MS on reads
● Provides linearizability on single key and overlapping keys
● Transactions that span disjoint set of key it only provides serializability and not
linearizability
● Some edge cases cause anomalies called “casual reverse” - Jepsen
● “Enterprise-only” features like row-level replication zones
● Supports migrating by supporting PostgreSQL syntax and drivers, however it does
not offer exact compatibility.
YugaByte
● Another Database Inspired by Spanner that relies on Hybrid Logical Clocks
● Currently only supports snapshot isolation
● Serializable isolation level work in progress
● Distributed Transactions to multiple partitions require a provisional record or
temporary table
FaunaDB - Consistency without Clocks
● Transaction resolution based on the Calvin protocol - pre-ordering of transactions
before commit
● Global transaction ordering provides serializable consistency
● Transactions can include multiple rows - not restricted to data in a single row or
shard
● Distributed log based algorithm scales throughput with cluster size by partitioning
the log
● Low Latency Snapshot Reads
● Proprietary Query Language with a high learning curve
● Optimistic concurrency model can causes high number of failures with highly
contentious workloads
References
● Bla-bla-microservices-bla-bla http://jonasboner.com/bla-bla-microservices-bla-bla/
● Aphyr Strong consistency models -
https://aphyr.com/posts/313-strong-consistency-models
● Achieving ACID Transactions in a Globally Distributed Database from FaunaDB
● Peter Bailis - Linearizability versus Serializability
● Calvin: fast distributed transactions for partitioned database systems

More Related Content

What's hot

Microservices for a Streaming World
Microservices for a Streaming WorldMicroservices for a Streaming World
Microservices for a Streaming World
Ben Stopford
 
Event Driven-Architecture from a Scalability perspective
Event Driven-Architecture from a Scalability perspectiveEvent Driven-Architecture from a Scalability perspective
Event Driven-Architecture from a Scalability perspective
Jonas Bonér
 
The Architect's Two Hats
The Architect's Two HatsThe Architect's Two Hats
The Architect's Two Hats
Ben Stopford
 
Accelerate DevOps/Microservices and Kubernetes
Accelerate DevOps/Microservices and KubernetesAccelerate DevOps/Microservices and Kubernetes
Accelerate DevOps/Microservices and Kubernetes
Rick Hightower
 
10 Tricks to Ensure Your Oracle Coherence Cluster is Not a "Black Box" in Pro...
10 Tricks to Ensure Your Oracle Coherence Cluster is Not a "Black Box" in Pro...10 Tricks to Ensure Your Oracle Coherence Cluster is Not a "Black Box" in Pro...
10 Tricks to Ensure Your Oracle Coherence Cluster is Not a "Black Box" in Pro...
SL Corporation
 
Declare Victory with Big Data
Declare Victory with Big DataDeclare Victory with Big Data
Declare Victory with Big Data
J On The Beach
 
Unlocking the Power of Salesforce Integrations with Confluent
Unlocking the Power of Salesforce Integrations with ConfluentUnlocking the Power of Salesforce Integrations with Confluent
Unlocking the Power of Salesforce Integrations with Confluent
AaronLieberman5
 
Accelerate Delivery: Business Case for Agile DevOps, CI/CD and Microservices
Accelerate Delivery: Business Case for Agile DevOps, CI/CD and MicroservicesAccelerate Delivery: Business Case for Agile DevOps, CI/CD and Microservices
Accelerate Delivery: Business Case for Agile DevOps, CI/CD and Microservices
Rick Hightower
 
Designing microservices part2
Designing microservices part2Designing microservices part2
Designing microservices part2
Masashi Narumoto
 
Events & Microservices
Events & MicroservicesEvents & Microservices
Events & Microservices
Yamen Sader
 
Data Insight Action
Data Insight ActionData Insight Action
Data Insight Action
Cequity Solutions
 
Azure Cloud Patterns
Azure Cloud PatternsAzure Cloud Patterns
Azure Cloud Patterns
Tamir Dresher
 
DevOpsDays SLC - Getting Along With Your DBOps Team
DevOpsDays SLC - Getting Along With Your DBOps TeamDevOpsDays SLC - Getting Along With Your DBOps Team
DevOpsDays SLC - Getting Along With Your DBOps Team
Nick DeMaster
 
Microservices Patterns and Anti-Patterns
Microservices Patterns and Anti-PatternsMicroservices Patterns and Anti-Patterns
Microservices Patterns and Anti-Patterns
Corneil du Plessis
 
C*ollege Credit: Is My App a Good Fit for Cassandra?
C*ollege Credit: Is My App a Good Fit for Cassandra?C*ollege Credit: Is My App a Good Fit for Cassandra?
C*ollege Credit: Is My App a Good Fit for Cassandra?DataStax
 
Nats meetup sf 20150826
Nats meetup sf   20150826Nats meetup sf   20150826
Nats meetup sf 20150826
Apcera
 
API Days Singapore
API Days SingaporeAPI Days Singapore
API Days Singapore
confluent
 
Turning client-side-to-server-side-ruxcon-2011-laurent
Turning client-side-to-server-side-ruxcon-2011-laurentTurning client-side-to-server-side-ruxcon-2011-laurent
Turning client-side-to-server-side-ruxcon-2011-laurent
lgandx
 
Event-Driven Architecture (EDA)
Event-Driven Architecture (EDA)Event-Driven Architecture (EDA)
Event-Driven Architecture (EDA)
WSO2
 
Azure Application Architecture Guide
Azure Application Architecture GuideAzure Application Architecture Guide
Azure Application Architecture Guide
Masashi Narumoto
 

What's hot (20)

Microservices for a Streaming World
Microservices for a Streaming WorldMicroservices for a Streaming World
Microservices for a Streaming World
 
Event Driven-Architecture from a Scalability perspective
Event Driven-Architecture from a Scalability perspectiveEvent Driven-Architecture from a Scalability perspective
Event Driven-Architecture from a Scalability perspective
 
The Architect's Two Hats
The Architect's Two HatsThe Architect's Two Hats
The Architect's Two Hats
 
Accelerate DevOps/Microservices and Kubernetes
Accelerate DevOps/Microservices and KubernetesAccelerate DevOps/Microservices and Kubernetes
Accelerate DevOps/Microservices and Kubernetes
 
10 Tricks to Ensure Your Oracle Coherence Cluster is Not a "Black Box" in Pro...
10 Tricks to Ensure Your Oracle Coherence Cluster is Not a "Black Box" in Pro...10 Tricks to Ensure Your Oracle Coherence Cluster is Not a "Black Box" in Pro...
10 Tricks to Ensure Your Oracle Coherence Cluster is Not a "Black Box" in Pro...
 
Declare Victory with Big Data
Declare Victory with Big DataDeclare Victory with Big Data
Declare Victory with Big Data
 
Unlocking the Power of Salesforce Integrations with Confluent
Unlocking the Power of Salesforce Integrations with ConfluentUnlocking the Power of Salesforce Integrations with Confluent
Unlocking the Power of Salesforce Integrations with Confluent
 
Accelerate Delivery: Business Case for Agile DevOps, CI/CD and Microservices
Accelerate Delivery: Business Case for Agile DevOps, CI/CD and MicroservicesAccelerate Delivery: Business Case for Agile DevOps, CI/CD and Microservices
Accelerate Delivery: Business Case for Agile DevOps, CI/CD and Microservices
 
Designing microservices part2
Designing microservices part2Designing microservices part2
Designing microservices part2
 
Events & Microservices
Events & MicroservicesEvents & Microservices
Events & Microservices
 
Data Insight Action
Data Insight ActionData Insight Action
Data Insight Action
 
Azure Cloud Patterns
Azure Cloud PatternsAzure Cloud Patterns
Azure Cloud Patterns
 
DevOpsDays SLC - Getting Along With Your DBOps Team
DevOpsDays SLC - Getting Along With Your DBOps TeamDevOpsDays SLC - Getting Along With Your DBOps Team
DevOpsDays SLC - Getting Along With Your DBOps Team
 
Microservices Patterns and Anti-Patterns
Microservices Patterns and Anti-PatternsMicroservices Patterns and Anti-Patterns
Microservices Patterns and Anti-Patterns
 
C*ollege Credit: Is My App a Good Fit for Cassandra?
C*ollege Credit: Is My App a Good Fit for Cassandra?C*ollege Credit: Is My App a Good Fit for Cassandra?
C*ollege Credit: Is My App a Good Fit for Cassandra?
 
Nats meetup sf 20150826
Nats meetup sf   20150826Nats meetup sf   20150826
Nats meetup sf 20150826
 
API Days Singapore
API Days SingaporeAPI Days Singapore
API Days Singapore
 
Turning client-side-to-server-side-ruxcon-2011-laurent
Turning client-side-to-server-side-ruxcon-2011-laurentTurning client-side-to-server-side-ruxcon-2011-laurent
Turning client-side-to-server-side-ruxcon-2011-laurent
 
Event-Driven Architecture (EDA)
Event-Driven Architecture (EDA)Event-Driven Architecture (EDA)
Event-Driven Architecture (EDA)
 
Azure Application Architecture Guide
Azure Application Architecture GuideAzure Application Architecture Guide
Azure Application Architecture Guide
 

Similar to Data Patterns

Overcoming Data Gravity in Multi-Cloud Enterprise Architectures
Overcoming Data Gravity in Multi-Cloud Enterprise ArchitecturesOvercoming Data Gravity in Multi-Cloud Enterprise Architectures
Overcoming Data Gravity in Multi-Cloud Enterprise Architectures
VMware Tanzu
 
Big Data on Cloud Native Platform
Big Data on Cloud Native PlatformBig Data on Cloud Native Platform
Big Data on Cloud Native Platform
Sunil Govindan
 
Big Data on Cloud Native Platform
Big Data on Cloud Native PlatformBig Data on Cloud Native Platform
Big Data on Cloud Native Platform
Sunil Govindan
 
Cloud Migration headache? Ease the pain with Data Virtualization! (EMEA)
Cloud Migration headache? Ease the pain with Data Virtualization! (EMEA)Cloud Migration headache? Ease the pain with Data Virtualization! (EMEA)
Cloud Migration headache? Ease the pain with Data Virtualization! (EMEA)
Denodo
 
BigData Hadoop
BigData Hadoop BigData Hadoop
BigData Hadoop
Kumari Surabhi
 
Govern and Protect Your End User Information
Govern and Protect Your End User InformationGovern and Protect Your End User Information
Govern and Protect Your End User Information
Denodo
 
"Using Multi-Master data replication for the parallel-run refactoring", Myros...
"Using Multi-Master data replication for the parallel-run refactoring", Myros..."Using Multi-Master data replication for the parallel-run refactoring", Myros...
"Using Multi-Master data replication for the parallel-run refactoring", Myros...
Fwdays
 
How To Tell if Your Business Needs NoSQL
How To Tell if Your Business Needs NoSQLHow To Tell if Your Business Needs NoSQL
How To Tell if Your Business Needs NoSQL
DataStax
 
451 Research + NuoDB: What It Means to be a Container-Native SQL Database
451 Research + NuoDB: What It Means to be a Container-Native SQL Database451 Research + NuoDB: What It Means to be a Container-Native SQL Database
451 Research + NuoDB: What It Means to be a Container-Native SQL Database
NuoDB
 
SpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud ComputingSpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud Computing
SpringPeople
 
When to Use MongoDB...and When You Should Not...
When to Use MongoDB...and When You Should Not...When to Use MongoDB...and When You Should Not...
When to Use MongoDB...and When You Should Not...
MongoDB
 
Polyglot persistence for enterprise cloud applications
Polyglot persistence for enterprise cloud applicationsPolyglot persistence for enterprise cloud applications
Polyglot persistence for enterprise cloud applications
Lars Lemos
 
Databases through out and beyond Big Data hype
Databases through out and beyond Big Data hypeDatabases through out and beyond Big Data hype
Databases through out and beyond Big Data hype
Parinaz Ameri
 
¿Cómo modernizar una arquitectura de TI con la virtualización de datos?
¿Cómo modernizar una arquitectura de TI con la virtualización de datos?¿Cómo modernizar una arquitectura de TI con la virtualización de datos?
¿Cómo modernizar una arquitectura de TI con la virtualización de datos?
Denodo
 
Evolving From Monolithic to Distributed Architecture Patterns in the Cloud
Evolving From Monolithic to Distributed Architecture Patterns in the CloudEvolving From Monolithic to Distributed Architecture Patterns in the Cloud
Evolving From Monolithic to Distributed Architecture Patterns in the Cloud
Denodo
 
Overcoming Today's Data Challenges with MongoDB
Overcoming Today's Data Challenges with MongoDBOvercoming Today's Data Challenges with MongoDB
Overcoming Today's Data Challenges with MongoDB
MongoDB
 
Cignex mongodb-sharding-mongodbdays
Cignex mongodb-sharding-mongodbdaysCignex mongodb-sharding-mongodbdays
Cignex mongodb-sharding-mongodbdaysMongoDB APAC
 
MongoDB on Financial Services Sector
MongoDB on Financial Services SectorMongoDB on Financial Services Sector
MongoDB on Financial Services Sector
Norberto Leite
 
Data Virtualization: An Introduction
Data Virtualization: An IntroductionData Virtualization: An Introduction
Data Virtualization: An Introduction
Denodo
 
Microservices as an evolutionary architecture: lessons learned
Microservices as an evolutionary architecture: lessons learnedMicroservices as an evolutionary architecture: lessons learned
Microservices as an evolutionary architecture: lessons learned
Luram Archanjo
 

Similar to Data Patterns (20)

Overcoming Data Gravity in Multi-Cloud Enterprise Architectures
Overcoming Data Gravity in Multi-Cloud Enterprise ArchitecturesOvercoming Data Gravity in Multi-Cloud Enterprise Architectures
Overcoming Data Gravity in Multi-Cloud Enterprise Architectures
 
Big Data on Cloud Native Platform
Big Data on Cloud Native PlatformBig Data on Cloud Native Platform
Big Data on Cloud Native Platform
 
Big Data on Cloud Native Platform
Big Data on Cloud Native PlatformBig Data on Cloud Native Platform
Big Data on Cloud Native Platform
 
Cloud Migration headache? Ease the pain with Data Virtualization! (EMEA)
Cloud Migration headache? Ease the pain with Data Virtualization! (EMEA)Cloud Migration headache? Ease the pain with Data Virtualization! (EMEA)
Cloud Migration headache? Ease the pain with Data Virtualization! (EMEA)
 
BigData Hadoop
BigData Hadoop BigData Hadoop
BigData Hadoop
 
Govern and Protect Your End User Information
Govern and Protect Your End User InformationGovern and Protect Your End User Information
Govern and Protect Your End User Information
 
"Using Multi-Master data replication for the parallel-run refactoring", Myros...
"Using Multi-Master data replication for the parallel-run refactoring", Myros..."Using Multi-Master data replication for the parallel-run refactoring", Myros...
"Using Multi-Master data replication for the parallel-run refactoring", Myros...
 
How To Tell if Your Business Needs NoSQL
How To Tell if Your Business Needs NoSQLHow To Tell if Your Business Needs NoSQL
How To Tell if Your Business Needs NoSQL
 
451 Research + NuoDB: What It Means to be a Container-Native SQL Database
451 Research + NuoDB: What It Means to be a Container-Native SQL Database451 Research + NuoDB: What It Means to be a Container-Native SQL Database
451 Research + NuoDB: What It Means to be a Container-Native SQL Database
 
SpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud ComputingSpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud Computing
 
When to Use MongoDB...and When You Should Not...
When to Use MongoDB...and When You Should Not...When to Use MongoDB...and When You Should Not...
When to Use MongoDB...and When You Should Not...
 
Polyglot persistence for enterprise cloud applications
Polyglot persistence for enterprise cloud applicationsPolyglot persistence for enterprise cloud applications
Polyglot persistence for enterprise cloud applications
 
Databases through out and beyond Big Data hype
Databases through out and beyond Big Data hypeDatabases through out and beyond Big Data hype
Databases through out and beyond Big Data hype
 
¿Cómo modernizar una arquitectura de TI con la virtualización de datos?
¿Cómo modernizar una arquitectura de TI con la virtualización de datos?¿Cómo modernizar una arquitectura de TI con la virtualización de datos?
¿Cómo modernizar una arquitectura de TI con la virtualización de datos?
 
Evolving From Monolithic to Distributed Architecture Patterns in the Cloud
Evolving From Monolithic to Distributed Architecture Patterns in the CloudEvolving From Monolithic to Distributed Architecture Patterns in the Cloud
Evolving From Monolithic to Distributed Architecture Patterns in the Cloud
 
Overcoming Today's Data Challenges with MongoDB
Overcoming Today's Data Challenges with MongoDBOvercoming Today's Data Challenges with MongoDB
Overcoming Today's Data Challenges with MongoDB
 
Cignex mongodb-sharding-mongodbdays
Cignex mongodb-sharding-mongodbdaysCignex mongodb-sharding-mongodbdays
Cignex mongodb-sharding-mongodbdays
 
MongoDB on Financial Services Sector
MongoDB on Financial Services SectorMongoDB on Financial Services Sector
MongoDB on Financial Services Sector
 
Data Virtualization: An Introduction
Data Virtualization: An IntroductionData Virtualization: An Introduction
Data Virtualization: An Introduction
 
Microservices as an evolutionary architecture: lessons learned
Microservices as an evolutionary architecture: lessons learnedMicroservices as an evolutionary architecture: lessons learned
Microservices as an evolutionary architecture: lessons learned
 

Recently uploaded

FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 

Recently uploaded (20)

FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 

Data Patterns

  • 1. Data Patterns in Microservice Applications Ryan Knight - CEO / CTO Grand Cloud @knight_cloud
  • 2. Ryan Knight ● CEO / CTO of Grand Cloud - Boutique consulting company working at the intersection of Distributed Systems and Data Engineering ● Experience ranges across traditional software development and architecture to sales engineering, consulting, solution architecture and developer advocacy. ● Worked across wide range of companies from small startups such as Lightbend and DataStax to Large Corporations such as Starbucks and Capital One. ● Consulting Experience spans over 50 companies and 10 Countries ● Currently Consulting at Brighthouse Financial
  • 3. Distributed System Design Heart of distributed system design is a requirement for a consistent, performant, and reliable way of managing data - Jonas Bonér
  • 4. Cloud Native -> New Requirements Users: 1 million+ Data volume: TB–PB–EB Locality: Global Performance: Milliseconds–microseconds Request rate: Millions Access: Web, mobile, IoT, devices Scale: Up-down, Out-in Economics: Pay for what you use Developer access: No assembly required
  • 6. ● RDBMS ● CAP Theorem ● Trade-off between Consistency and Scale ● Rise of Eventual Consistency ● NoSQL Databases EASY COMPLEX ACID TXN / Strong Consistency Eventual Consistency (D)evolution of Consistency
  • 9. Challenges with Application Tier Consistency ● Consistency problems are far harder to solve in the application tier ● Increased Corner Case Bugs ○ Consistency is really hard to get right in the Application Tier! ○ Consistency is really hard to test and verify ● Increased Complexity
  • 10. Business Impact of Consistency ● Travel Booking of Flight, Hotel, etc. - Inconsistencies could either lead to double bookings or lost bookings. ● Rewards Program - Very difficult to prevent fraudulent redemptions. Potential for monetary loss. ● Physical Allocation of Resources vs. Digital Realm ● Inventory / Limited Sales
  • 11. Direct Business Value of “Strong Consistency” ● Increases accuracy of sales and reduces lost business revenue ● Cost Savings with reduced operational complexity and increased visibility into business operations. ● Weak Consistency is a Security Concern - Possible financial loss from inconsistent views of data. ● ACIDRain Attack - Todd Warszawski, Peter Bailis ○ 22 critical ACIDRain attacks that allow attackers to corrupt store inventory,over-spend gift cards, and steal inventory. ○ Bankrupt popular Bitcoin exchange
  • 12. Eventual Consistency ● Internet of Things ● Media ● Retail ● Real-time Analytics ● Time-Series ● Monitoring ● Customer 360 Strong Consistency ● Financial Transactions ● Rewards Programs ● Inventory Management ● Global Meta-Data ● Travel Reservations ● Gaming ● Billing / Payments ● Ad Tech
  • 13. What is Data Consistency?
  • 14. Challenges with Understanding Consistency ● Lots of Definitions of Consistency ● Consistency in ACID is about enforcing invariants ○ Data must be valid according to all defined rules ○ Not the consistency we are looking for ● "Strong consistency" - term used to differentiate full consistency from weaker levels of consistency such as casual or session consistency.
  • 15. Consistency Challenges Dirty Reads - Read Uncommitted Write Read Skew / Non-Repeatable Reads Read your own Writes Lost Updates Write Skew
  • 16. Write Skew Two concurrent transactions each determine what they are writing based on reading a data set which overlaps what the other is writing begriffs.com
  • 17. Consistency Models Credit to Peter Bailis and Aphyr at jepsen.io http://www.bailis.org/blog/linearizability-versus-serializability/
  • 18. Linearizability ● Guarantees that the order of reads and writes to a single register or row will always appear the same on all nodes. ● Appearance that there is only one copy of the data. ● It doesn’t group operations into transactions. ● Guarantees read-your-write behavior.
  • 19. Linearizable Consistency in CAP ● CAP Theorem is about “atomic consistency” ● Atomic consistency refers only to a property of a single request/response operation sequence. ● Strong Consistency in CAP is Linearizability
  • 20. Serializable Consistency ● Transaction Isolation ● Database guarantees that two transactions have the same effect as if they where run serially. ● multi-operation, multi-object, arbitrary total order
  • 21. Strict Serializability ● Linearizability plus Serializability provides Strict Serializability ● Highest level of Consistency ● Guarantee ordering and transaction isolation
  • 22. Linearizable vs. Serializable Consistency ● Serializability - multi-operation, multi-object, arbitrary total order ● Linearizability - single-operation, single-object, real-time order ● Strict Serializability - Linearizability plus Serializability provides Strict Serializability Peter Bailis - Linearizability versus Serializability
  • 23. No One Solution to Consistency ● Do you want your data right or right now? - Pat Helland ● PACELC Theorem -> More than CAP ○ In the absence of network partitions the trade-off is between latency and consistency - Daniel Abadi ● Evaluate trade-offs in the differing approaches
  • 25. From Monolith to Microservices to Serverless ● Data Consistency was easy in a monolith application - single source of truth w/ ACID transactions ● Move to microservices each service became a bounded context that owns and manages its data. ● Data Consistency became very difficult w/ microservices ● Serverless increases the complexity even more
  • 26. Consistency Challenges with Data in Microservices ● Traditional ACID transactions did not scale ● Data orchestration between multiple services - Number of Microservices Increases Number of Interactions ● Stateful or Stateless ● Data rehydration for things like service failures and rolling updates.
  • 27. Popularity of Eventual Consistency CAP Theorem • Force choice between Global Scale or Strong Consistency Eventual Consistency • Sacrificed consistency for availability and partition tolerance. • Really a Necessary Evil • Write now and figure it out later Pushed complexity of managing consistency to application tier
  • 29. Value of Consistency in the Database ● Decrease Application Tier Complexity ● Reduce Cognitive Overhead ● Increased Developer Productivity ● Increased Focus on Business Value ● Most implementations also provide strong atomicity and isolation ● Push complexity of consistency back to the database ● Not a panacea for all data consistency challenges
  • 30. Case Study - AdStage ● Recently migrated from Cassandra to Postgres ● Leverage Postgres DB Transactions ● Found Postgres to be extremely capable with advance data model and query capabilities ● Significant decrease in application and operational complexity ● Significantly reduced operational costs
  • 31. Leveraging DB Consistency ● Ledger Pattern with Compare and Swap Like Operation ● Application reads latest ledger id from DB ● Application makes an update with what it thinks is the latest ledger id plus one ● DB transaction / stored procedure to read the last ledger id and make the update if the ledger id is greater than the last entry ● If update fails DB returns correct Ledger ID
  • 32. Traditional / Hybrid NoSQL DB’s ● Cloud Operated Relation DB’s are a re-emerging trend. ● Cloud SQL w/ Postgres or MySQL ● AWS Aurora - Amazon re-designed MySQL as a cloud-native relational database ● AWS Dynamo w/ Transactions - Multiple Object with limits to single region
  • 33. Next Generation Databases ● Google Spanner - Horizontally scalable, globally consistent, relational database service. Relies on on Proprietary Atomic Clocks and Low Latency Network. ● Coackroach & YugaByte - Open Source version of Spanner with 2 Phase Commits and Hybrid-Logical Clocks ● Fauna - Single Phase Commit with no hard dependency on clocks ● FoundationDB - Serializable Optimistic MVCC concurrency. Loosely based on Google Percolator ● TiDB - Hybrid Transactional and Analytical Processing (HTAP) workloads. Features “horizontal scalability, strong consistency, and high availability.” ● Microsoft Azure Cosmos DB - Configurable consistency guarantees
  • 34. Transactions are hard. Distributed transactions are harder. Distributed transactions over the WAN are final boss hardness. I'm all for new DBMSs but people should tread carefully. - Andy Pavlo New Generation / Global Transactional Databases
  • 35. Not All Global Databases are the Same ● Differences in Transaction Protocol ● Global Ordering Done in a Single Phase vs. Multi-Phase ● Pre or Post Commit Transaction Resolution ● Different levels of consistency ● Maximum scope of a transaction - Single Record vs. Multiple Records ● Geographic limits of transactions - Single Region vs. Global? ● Storage Layer is an entirely other discussion beyond the transaction protocol. Large impact on performance and stability!
  • 36. Week Isolation Level Scope of Transaction - Single Row Eventually Consistent Strongest Isolation Level Scope of Transaction - Distributed Across Partitions Serializable Consistency Consistency and the ACID Spectrum
  • 37. Consistency Levels in Next Gen Databases - 1/2 ● Google Spanner - External strong consistency across rows, regions, and continents. ● Yugabyte - snapshot isolation, not serializability yet, writes must go to partition leaders. Reliance on hybrid clocks makes it difficult to run in virtualized environments. ● Cockroach - serializability but not strict serializability, reads and writes must go to partition leaders, no replica reads allowed
  • 38. Consistency Levels in Next Gen Databases - 2/2 ● TiDB - read-committed within a datacenter, no serializability, timestamp oracle must issue leases for all write transactions, replica reads unclear ● FoundationDb: Serializable Snapshot Isolation and strictly serializable within a datacenter, timestamp oracle must issue leases for all serializable reads and all writes, snapshot reads possible ● FaunaDB - Global pre-ordering of transactions provides strict serializable consistency ● Azure Cosmos DB - Five consistency models allow developer to choose between latency and consistency. Highest Level of consistency is strong consistency with linearizability guarantees. Doesn’t seem to be strict serializable?
  • 40. Application Tier Consistency Write now and figure it out later
  • 41. Advantages of Application Tier Consistency ● Low Read / Write Latency ● High-Throughput ● Read your Writes - Same session only ● Requires application to enforce session stickiness
  • 42. Disadvantages of Application Tier Consistency ● Consistency problems are far harder to solve in the application tier ● Increased Complexity ● No Isolation and limited atomicity ● Corner Case Bugs - Consistency is really hard to test and verify ● No magic pattern or technology that you can sprinkle on data to make it consistent.
  • 43. Options for Application Tier Consistency ● Serialization Points - i.e. Kafka Consumers pinned to session id’s. ● Akka Clustering - Stateful Services pinned to a client id. ● CRDT - Conflict Free Replicated Data Types, i.e. Associative Counters. Data must be of a certain shape to work. ● Event Sourcing / Append Only Logging with Aggregates for running totals. Hard to provide consistency guarantees across aggregates. ● Saga Pattern - Builds on Event Sourcing and uses a Central Coordinator to manages complex transaction logic. Relies heavily on idempotent services that can roll back transactions in the face of failures.
  • 44. Patterns for Application Tier Consistency ● Kafka Consumer Serialization Points ● Akka Clustering w/ Cluster Singletons ● CRDT - Conflict Free Replicated Data Types ● Event Sourcing / Append Only Logging with Aggregates ● CQRS ● Saga Pattern ● Custom Distributed Transactions WIRED TIRED
  • 45. CRDT’s ● CRDT - Conflict Free Replicated Data Types ● Data types that guarantee convergence to the same value without any synchronization mechanism ● Consistency without Consensus ● Avoid distributed locks, two-phase commit, etc. Data Structure that tells how to build the value ● Sacrifice linearizability (guaranteed ordering ) while remaining correct
  • 46. Overview of Saga Pattern ● Central Coordinator ● Manages Complex Transaction Logic ● State managed in an distributed log ● Split work into idempotent executors / requests ● Requires compensating transactions for dealing with failures / aborting transaction ● Effectively Once instead of Exactly Once
  • 47. The Challenges with the Saga Pattern ● Consistency is reliant on the consistency of the distributed log ● Limited Consistency ● Weak Isolation ● No Guaranteed Atomicity - Unsafe partially committed states ● Complexity with versioning of Saga Logic ● Increased application complexity ● Rollback and recovery logic required in application tier ● Idempotency impossible for some services ● Effectively Once instead of Exactly Once
  • 48. Data Patterns in Microservice Applications Ryan Knight - CEO / CTO Grand Cloud @knight_cloud
  • 50. Global Scale Next Gen Databases
  • 51. Spanner ● External consistency, an isolation level even stricter than strict serializability ● Relation Integrity Constraints ● 99.999% availability SLA ● Uses a global commit timestamps to guarantee ordering of transactions via the TrueTime API. ● Multiple Shards with 2PC ● Single Shard Avoids 2PC for Writes / Read-only Transactions also avoid 2 PC ● No Downtime upgrades - Maintenance done by moving data between nodes ● Downside is cost and some limitations to the SQL model and schema design
  • 52. CoackroachDB ● Open source Database Inspired by Spanner ● Hybrid Logical Clock similar to a vector clock for ordering of transactions ● Challenges with clock skew - waits up to 250 MS on reads ● Provides linearizability on single key and overlapping keys ● Transactions that span disjoint set of key it only provides serializability and not linearizability ● Some edge cases cause anomalies called “casual reverse” - Jepsen ● “Enterprise-only” features like row-level replication zones ● Supports migrating by supporting PostgreSQL syntax and drivers, however it does not offer exact compatibility.
  • 53. YugaByte ● Another Database Inspired by Spanner that relies on Hybrid Logical Clocks ● Currently only supports snapshot isolation ● Serializable isolation level work in progress ● Distributed Transactions to multiple partitions require a provisional record or temporary table
  • 54. FaunaDB - Consistency without Clocks ● Transaction resolution based on the Calvin protocol - pre-ordering of transactions before commit ● Global transaction ordering provides serializable consistency ● Transactions can include multiple rows - not restricted to data in a single row or shard ● Distributed log based algorithm scales throughput with cluster size by partitioning the log ● Low Latency Snapshot Reads ● Proprietary Query Language with a high learning curve ● Optimistic concurrency model can causes high number of failures with highly contentious workloads
  • 55. References ● Bla-bla-microservices-bla-bla http://jonasboner.com/bla-bla-microservices-bla-bla/ ● Aphyr Strong consistency models - https://aphyr.com/posts/313-strong-consistency-models ● Achieving ACID Transactions in a Globally Distributed Database from FaunaDB ● Peter Bailis - Linearizability versus Serializability ● Calvin: fast distributed transactions for partitioned database systems