Strong Consistency in Databases. What does it actually guarantee? - Andy Gooding

Strong Consistency
in Databases
What does it actually guarantee?
Andrew Gooding
VP of Engineering, Aerospike

strong Consistency
maximum Availability
cluster Partitioning
C
A
P
Choose any two … as long as one is P!

Why Choose P?
Because P happens
• Nodes will crash
• Connections will break

So … C or A?
• C == Strong Consistency
• P == Maximum Availability
is well defined
!= 100% availability
• Choosing C does not mean every P scenario causes
less than maximum availability
• Choosing A does not mean every P scenario breaks
consistency

• A successful operation must be part of the
progression and never be "lost"
What is C?
• Single linear progression of record version
• Will discuss observed version progression later
V₀ V₁ V₂ V₃ V₄
O₁ O₂ O₃ O₄

What is A?
• Operations succeed as long as
any part of the cluster is visible
• Means that record progression
may split …

… which will break Consistency
Even if we later recover V₄ via "eventual consistency"
V₀ V 𝗮
V 𝘅 V 𝘆
V 𝗯
O₁
O₂
O₃
O₄
V₄

Even in simple cases "eventual consistency" is messy …
• y & b not enough - must know value at split
+5
+2
+9
+7
V₀
100
V 𝗮
105
V 𝘅
102
V 𝘆
111
V 𝗯
112
V₄
123
• Store versions & values at splits
• Know when we're split or store all versions & values

In general it gets worse …
+4? x2?
V₀
4
• Operations not derivable from values
V₁
8
md5()
V₀
?
• Previous values not derivable from operations
V₁
12bba

and worse …
• Operations may not commute
+20
V₀
100
V 𝗮
120
V 𝗮
60
÷2
÷2
V₀
100
V 𝗮
50
V 𝗮
70
+20

which all means …
• Can't just use version & value at split – must
know operations and times
• Store operations and times from splits
(or always)
• CRDTs and "user merge" options are
still complex and use case dependent
• General "eventual consistency" is intractable

Choosing A usually means:
Conflict resolution loses writes
V₀
V 𝘅 V 𝘆
V 𝗮
O₃
O₁
O₂
V 𝗮
• e.g. pick latest and lose two writes …

Choosing A usually means:
Conflict resolution loses writes
V₀
V 𝘅 V 𝘆
V 𝗮
O₃
O₁
O₂
V 𝘆
• … or pick most history and lose latest write

Choosing A generally means:
• There are scenarios where record progression
(or "lineage") will split …
• … and be "observable" – by definition breaking C
• Conflict resolution of split lineages is complex and
requires storing lots of history …
• … or is simpler and results in data loss

Choosing C generally means:
• Disallow operations that would
cause record version splits
• Disallow observation of
(transient unresolved) "dirty"
versions
V
VV
🚫
V💩
Which means less than maximum availability

With C, there are choices of read-only behavior:
• "Session" – single client must never see stale version
• "Linearized" – even across different clients, must
never see stale version
• "Relaxed" – ok to see "stale" version, e.g. V₄ then V₃
V₀ V₁ V₂ V₃ V₄
O₁ O₂ O₃ O₄

C – traditional "consensus" system:
• Successful operations must have
written to a majority of replicas
• 100% availability with n nodes
missing, given 2n+1 replicas
V
VV
V
O₁
O₁
O₁
V'
V'
V'
• Need >= 3 replicas to do rolling
upgrades with 100% availability
• Preserves partial availability well by making use of all
connectivity

C – Alternative, "Kevin's system":
• Successful operations must have
written to RF replicas
• 100% availability with n nodes
missing, given n+1 replicas
V
V
V
O₁
O₁
V'
V'
• Need >= 2 replicas to do rolling
upgrades with 100% availability
• From 100% availability in simple splits … partial
availability decreases quickly as connectivity worsens

"Kevin's system" pros:
• Fewer replicas needed to tolerate given number of
nodes down or …
• … tolerates more nodes down for given RF
• For given tolerance this means lower cost and higher
performance
• Preserves higher partial availability in more severe
partition scenarios
Traditional system pros:

• "light rain" P like upgrades can preserve both C & A
• Guaranteeing C sacrifices A on very rainy days
• Guaranteeing C may be more involved to deploy
• Guaranteeing C may cost more or hurt performance
• A systems sacrifice C on very rainy days
• "eventual consistency" can cost more than C
• Otherwise A systems lose data on very rainy days
C vs A – what do you prefer during "very rainy day" P?

Jepsen – tests many things, among them C:
V₀ V₁ V₂ V₃ V₄
O₁ O₂ O₃ O₄
• Single linear progression of record version
• Never lose writes
• Reads linearized – even across different clients,
never see stale version
Also measures availability loss and recovery

Strong Consistency in Databases. What does it actually guarantee? - Andy Gooding

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Strong Consistency in Databases. What does it actually guarantee? - Andy Gooding

Similar to Strong Consistency in Databases. What does it actually guarantee? - Andy Gooding (20)

More from DevOpsDays Tel Aviv

More from DevOpsDays Tel Aviv (20)

Recently uploaded

Recently uploaded (20)

Strong Consistency in Databases. What does it actually guarantee? - Andy Gooding

Editor's Notes