Strong Consistency
in Databases
What does it actually guarantee?
Andrew Gooding
VP of Engineering, Aerospike
strong Consistency
maximum Availability
cluster Partitioning
C
A
P
Choose any two … as long as one is P!
Why Choose P?
Because P happens
• Nodes will crash
• Connections will break
So … C or A?
• C == Strong Consistency
• P == Maximum Availability
is well defined
!= 100% availability
• Choosing C does not mean every P scenario causes
less than maximum availability
• Choosing A does not mean every P scenario breaks
consistency
• A successful operation must be part of the
progression and never be "lost"
What is C?
• Single linear progression of record version
• Will discuss observed version progression later
V₀ V₁ V₂ V₃ V₄
O₁ O₂ O₃ O₄
What is A?
• Operations succeed as long as
any part of the cluster is visible
• Means that record progression
may split …
… which will break Consistency
Even if we later recover V₄ via "eventual consistency"
V₀ V 𝗮
V 𝘅 V 𝘆
V 𝗯
O₁
O₂
O₃
O₄
V₄
Even in simple cases "eventual consistency" is messy …
• y & b not enough - must know value at split
+5
+2
+9
+7
V₀
100
V 𝗮
105
V 𝘅
102
V 𝘆
111
V 𝗯
112
V₄
123
• Store versions & values at splits
• Know when we're split or store all versions & values
In general it gets worse …
+4? x2?
V₀
4
• Operations not derivable from values
V₁
8
md5()
V₀
?
• Previous values not derivable from operations
V₁
12bba
and worse …
• Operations may not commute
+20
V₀
100
V 𝗮
120
V 𝗮
60
÷2
÷2
V₀
100
V 𝗮
50
V 𝗮
70
+20
which all means …
• Can't just use version & value at split – must
know operations and times
• Store operations and times from splits
(or always)
• CRDTs and "user merge" options are
still complex and use case dependent
• General "eventual consistency" is intractable
Choosing A usually means:
Conflict resolution loses writes
V₀
V 𝘅 V 𝘆
V 𝗮
O₃
O₁
O₂
V 𝗮
• e.g. pick latest and lose two writes …
Choosing A usually means:
Conflict resolution loses writes
V₀
V 𝘅 V 𝘆
V 𝗮
O₃
O₁
O₂
V 𝘆
• … or pick most history and lose latest write
Choosing A generally means:
• There are scenarios where record progression
(or "lineage") will split …
• … and be "observable" – by definition breaking C
• Conflict resolution of split lineages is complex and
requires storing lots of history …
• … or is simpler and results in data loss
Choosing C generally means:
• Disallow operations that would
cause record version splits
• Disallow observation of
(transient unresolved) "dirty"
versions
V
VV
🚫
V💩
Which means less than maximum availability
With C, there are choices of read-only behavior:
• "Session" – single client must never see stale version
• "Linearized" – even across different clients, must
never see stale version
• "Relaxed" – ok to see "stale" version, e.g. V₄ then V₃
V₀ V₁ V₂ V₃ V₄
O₁ O₂ O₃ O₄
C – traditional "consensus" system:
• Successful operations must have
written to a majority of replicas
• 100% availability with n nodes
missing, given 2n+1 replicas
V
VV
V
O₁
O₁
O₁
V'
V'
V'
• Need >= 3 replicas to do rolling
upgrades with 100% availability
• Preserves partial availability well by making use of all
connectivity
C – Alternative, "Kevin's system":
• Successful operations must have
written to RF replicas
• 100% availability with n nodes
missing, given n+1 replicas
V
V
V
O₁
O₁
V'
V'
• Need >= 2 replicas to do rolling
upgrades with 100% availability
• From 100% availability in simple splits … partial
availability decreases quickly as connectivity worsens
"Kevin's system" pros:
• Fewer replicas needed to tolerate given number of
nodes down or …
• … tolerates more nodes down for given RF
• For given tolerance this means lower cost and higher
performance
• Preserves higher partial availability in more severe
partition scenarios
Traditional system pros:
• "light rain" P like upgrades can preserve both C & A
• Guaranteeing C sacrifices A on very rainy days
• Guaranteeing C may be more involved to deploy
• Guaranteeing C may cost more or hurt performance
• A systems sacrifice C on very rainy days
• "eventual consistency" can cost more than C
• Otherwise A systems lose data on very rainy days
C vs A – what do you prefer during "very rainy day" P?
Jepsen – tests many things, among them C:
V₀ V₁ V₂ V₃ V₄
O₁ O₂ O₃ O₄
• Single linear progression of record version
• Never lose writes
• Reads linearized – even across different clients,
never see stale version
Also measures availability loss and recovery
Thank You…

Strong Consistency in Databases. What does it actually guarantee? - Andy Gooding

  • 1.
    Strong Consistency in Databases Whatdoes it actually guarantee? Andrew Gooding VP of Engineering, Aerospike
  • 2.
    strong Consistency maximum Availability clusterPartitioning C A P Choose any two … as long as one is P!
  • 3.
    Why Choose P? BecauseP happens • Nodes will crash • Connections will break
  • 4.
    So … Cor A? • C == Strong Consistency • P == Maximum Availability is well defined != 100% availability • Choosing C does not mean every P scenario causes less than maximum availability • Choosing A does not mean every P scenario breaks consistency
  • 5.
    • A successfuloperation must be part of the progression and never be "lost" What is C? • Single linear progression of record version • Will discuss observed version progression later V₀ V₁ V₂ V₃ V₄ O₁ O₂ O₃ O₄
  • 6.
    What is A? •Operations succeed as long as any part of the cluster is visible • Means that record progression may split …
  • 7.
    … which willbreak Consistency Even if we later recover V₄ via "eventual consistency" V₀ V 𝗮 V 𝘅 V 𝘆 V 𝗯 O₁ O₂ O₃ O₄ V₄
  • 8.
    Even in simplecases "eventual consistency" is messy … • y & b not enough - must know value at split +5 +2 +9 +7 V₀ 100 V 𝗮 105 V 𝘅 102 V 𝘆 111 V 𝗯 112 V₄ 123 • Store versions & values at splits • Know when we're split or store all versions & values
  • 9.
    In general itgets worse … +4? x2? V₀ 4 • Operations not derivable from values V₁ 8 md5() V₀ ? • Previous values not derivable from operations V₁ 12bba
  • 10.
    and worse … •Operations may not commute +20 V₀ 100 V 𝗮 120 V 𝗮 60 ÷2 ÷2 V₀ 100 V 𝗮 50 V 𝗮 70 +20
  • 11.
    which all means… • Can't just use version & value at split – must know operations and times • Store operations and times from splits (or always) • CRDTs and "user merge" options are still complex and use case dependent • General "eventual consistency" is intractable
  • 12.
    Choosing A usuallymeans: Conflict resolution loses writes V₀ V 𝘅 V 𝘆 V 𝗮 O₃ O₁ O₂ V 𝗮 • e.g. pick latest and lose two writes …
  • 13.
    Choosing A usuallymeans: Conflict resolution loses writes V₀ V 𝘅 V 𝘆 V 𝗮 O₃ O₁ O₂ V 𝘆 • … or pick most history and lose latest write
  • 14.
    Choosing A generallymeans: • There are scenarios where record progression (or "lineage") will split … • … and be "observable" – by definition breaking C • Conflict resolution of split lineages is complex and requires storing lots of history … • … or is simpler and results in data loss
  • 15.
    Choosing C generallymeans: • Disallow operations that would cause record version splits • Disallow observation of (transient unresolved) "dirty" versions V VV 🚫 V💩 Which means less than maximum availability
  • 16.
    With C, thereare choices of read-only behavior: • "Session" – single client must never see stale version • "Linearized" – even across different clients, must never see stale version • "Relaxed" – ok to see "stale" version, e.g. V₄ then V₃ V₀ V₁ V₂ V₃ V₄ O₁ O₂ O₃ O₄
  • 17.
    C – traditional"consensus" system: • Successful operations must have written to a majority of replicas • 100% availability with n nodes missing, given 2n+1 replicas V VV V O₁ O₁ O₁ V' V' V' • Need >= 3 replicas to do rolling upgrades with 100% availability • Preserves partial availability well by making use of all connectivity
  • 18.
    C – Alternative,"Kevin's system": • Successful operations must have written to RF replicas • 100% availability with n nodes missing, given n+1 replicas V V V O₁ O₁ V' V' • Need >= 2 replicas to do rolling upgrades with 100% availability • From 100% availability in simple splits … partial availability decreases quickly as connectivity worsens
  • 19.
    "Kevin's system" pros: •Fewer replicas needed to tolerate given number of nodes down or … • … tolerates more nodes down for given RF • For given tolerance this means lower cost and higher performance • Preserves higher partial availability in more severe partition scenarios Traditional system pros:
  • 20.
    • "light rain"P like upgrades can preserve both C & A • Guaranteeing C sacrifices A on very rainy days • Guaranteeing C may be more involved to deploy • Guaranteeing C may cost more or hurt performance • A systems sacrifice C on very rainy days • "eventual consistency" can cost more than C • Otherwise A systems lose data on very rainy days C vs A – what do you prefer during "very rainy day" P?
  • 21.
    Jepsen – testsmany things, among them C: V₀ V₁ V₂ V₃ V₄ O₁ O₂ O₃ O₄ • Single linear progression of record version • Never lose writes • Reads linearized – even across different clients, never see stale version Also measures availability loss and recovery
  • 22.

Editor's Notes

  • #4 Can avoid P with a single node cluster … where the node never goes down
  • #5 a good distributed system should allow upgrade scenarios that preserve both C and A
  • #6 Single record transactions
  • #14 Merge preference is application dependent
  • #17 When do we need linearized reads? When can we use relaxed reads?