Eventually Consistent Data Structures (from strangeloop12)

•

13 likes•20,474 views

There are many reasons to use an eventually-consistent database — like Riak, Voldemort, or Cassandra — including increased availability, lower latency, and fault-tolerance. However, doing so requires a mental shift in how to structure client applications, and certain types of traditional data-structures, like sets, registers, and counters can’t be resolved simply in the face of race-conditions. It is difficult to achieve “logical monotonicity” except for the most trivial data-types. That is, until the advent of Convergent Replicated Data Types (CRDTs). CRDTs are data-structures that tolerate eventual consistency. They replace traditional data-structure implementations and all have the property that, given any number of conflicting versions of the same datum, there is a single state on which they converge (monotonicity). This talk will discuss some of the most useful CRDTs and how to apply them to solve real-world data problems.

Technology

Eventually-
Consistent Data
Structures
Sean Cribbs
@seancribbs #CRDT
StrangeLoop 2012

Riak is
Eventually
Consistent
So are Voldemort and Cassandra

Duals or Duels?
object-oriented / functional

Duals or Duels?
object-oriented / functional
static / dynamic

Duals or Duels?
object-oriented / functional
static / dynamic
consistency / availability

Duals or Duels?
object-oriented / functional
static / dynamic
consistency / availability
throughput / latency

Duals or Duels?
object-oriented / functional
static / dynamic
consistency / availability
throughput / latency
threaded / evented

Duals or Duels?
object-oriented / functional
static / dynamic
consistency / availability
throughput / latency
threaded / evented
safety / liveness

Safety / Liveness
Proving the Correctness of Multiprocess Programs - Leslie
Lamport (March 1977)

Safety / Liveness
Proving the Correctness of Multiprocess Programs - Leslie
Lamport (March 1977)

•Safety: “nothing bad happens”
(partial correctness)

Eventual
Consistency

Replicated
Loose coordination
3 Convergence

Eventual is Good

✔ Fault-tolerant
✔ Highly available
✔ Low-latency

Consistency?

No clear winner!
Throw one out?
3
Keep both?
B

Consistency?

No clear winner!
Throw one out?
3
Keep both?
B Cassandra

Consistency?

No clear winner!
Throw one out?
3
Keep both?
B Cassandra

Riak & Voldemort

Semantic
Resolution
• Your app knows the domain - use
business rules to resolve

• Amazon Dynamo’s shopping cart

Semantic
Resolution
• Your app knows the domain - use
business rules to resolve

• Amazon Dynamo’s shopping cart
“Ad hoc approaches have proven brittle and
error-prone”

Conﬂict-Free
Replicated
Data Types
useful abstractions

Conﬂict-Free
Replicated
Data Types
multiple
independent copies useful abstractions

resolves automatically
toward a single value

Conﬂict-Free
Replicated
Data Types
multiple
independent copies useful abstractions

http://db.cs.berkeley.edu/papers/UCB-lattice-tr.pdf

Bounded Join Semi-Lattices
〈S, ⊔, ⊥〉

‣ S is a set

Bounded Join Semi-Lattices
〈S, ⊔, ⊥〉

‣ S is a set
‣ ⊔ is a least-upper bound (join/merge) on
S

Bounded Join Semi-Lattices
〈S, ⊔, ⊥〉

‣ S is a set
‣ ⊔ is a least-upper bound (join/merge) on
S
‣ ⊥∈S

Bounded Join Semi-Lattices
〈S, ⊔, ⊥〉

‣ S is a set
‣ ⊔ is a least-upper bound (join/merge) on
S
‣ ⊥∈S
‣ ∀x, y ∈ S: x ≤S y x⊔y=y

Bounded Join Semi-Lattices
〈S, ⊔, ⊥〉

‣ S is a set
‣ ⊔ is a least-upper bound (join/merge) on
S
‣ ⊥∈S
‣ ∀x, y ∈ S: x ≤S y x⊔y=y

‣ ∀x ∈ S: x ⊔ ⊥ = x

lmax Lattice

S≔ℛ
a ⊔b ≔ max(a,b)
⊥ ≔ -∞

lset Lattice
{a,b,c,d,e}

{a,b,c,d} {b,c,d,e}
{b,c,d}
Time

{a,b,c} {c,d,e}
{b,c} {c,d} {d,e}
{a,b}

{a} {b} {c} {d} {e}

CRDT Flavors
• Convergent: State
• Weak messaging requirements
•Commutative: Operations
•Reliable broadcast required
•Causal ordering sufficient

Registers

• Last-Write Wins (LWW-Register)
• e.g. Columns in Cassandra
• Multi-Valued (MV-Register)
• e.g. Objects (values) in Riak

G-Counter
// Starts empty
[]

// A increments twice, forwarding state
[{a,1}] // == 1
[{a,2}] // == 2

G-Counter
// Starts empty
[]

// A increments twice, forwarding state
[{a,1}] // == 1
[{a,2}] // == 2

// B increments
[{b,1}] // == 1

G-Counter
// Starts empty
[]

// A increments twice, forwarding state
[{a,1}] // == 1
[{a,2}] // == 2

// B increments
[{b,1}] // == 1

// Merging
[{a,2}, {b,1}] [{a,1}, {b,1}]

PN-Counter
// A PN-Counter
{
P = [{a,10},{b,2}],
N = [{a,1},{c,5}]
}
// == (10+2)-(1+5) == 12-6 == 6

G-Set
// Starts empty
{}

// A adds a and b, forwarding state
{a}
{a,b}

G-Set
// Starts empty
{}

// A adds a and b, forwarding state
{a}
{a,b}

// B adds c
{c}

G-Set
// Starts empty
{}

// A adds a and b, forwarding state
{a}
{a,b}

// B adds c
{c}

// Merging
{a,b,c} {a,c}

2P-Set
// Starts empty
{A={},R={}}

// A adds a and b, forwarding state,
// removes a
{A={a}, R={}} // == {a}
{A={a,b},R={}} // == {a,b}
{A={a,b},R={a}} // == {b}

2P-Set
// Starts empty
{A={},R={}}

// A adds a and b, forwarding state,
// removes a
{A={a}, R={}} // == {a}
{A={a,b},R={}} // == {a,b}
{A={a,b},R={a}} // == {b}

// B adds c
{A={c},R={}} // == {c}

2P-Set
// Starts empty
{A={},R={}}

// A adds a and b, forwarding state,
// removes a
{A={a}, R={}} // == {a}
{A={a,b},R={}} // == {a,b}
{A={a,b},R={a}} // == {b}

// B adds c
{A={c},R={}} // == {c}
// Merging
{A={a,b,c},R={a}} {A={a,c}, R={}}

Use-Cases

• Social graph (OR-Set or a Graph)
• Web page visits (G-Counter)
• Shopping Cart (Modiﬁed OR-Set)
• “Like” button (U-Set)

Challenges: GC

• CRDTs are inefficient
• Synchronization may be required

Challenges:
Responsibility
• Client
• Erlang: mochi/statebox
• Clojure: reiddraper/knockbox
• Ruby: aphyr/meangirls, bkerley/
hanover

• Server

What's hot

Building Microservices with gRPC and NATSShiju Varghese

Achieving CI/CD with KubernetesRamit Surana

Why to Cloud NativeKarthik Gaekwad

MicroServices with Containers, Kubernetes & ServiceMeshAkash Agrawal

Google Firebase presentation - EnglishAlexandros Tsichouridis

AWS Finance Symposium_바로 도입할 수 있는 금융권 업무의 클라우드 아키텍처 알아보기Amazon Web Services Korea

AnthosMeena Sambamurthy

How to Manage Your Time Series Data Pipeline at the Edge with InfluxDBInfluxData

Kafka + Uber- The World’s Realtime Transit Infrastructure, Aaron Schildkroutconfluent

Querying Linked DataEUCLID project

CI/CD trên Cloud OpenStack tại Viettel Networks | Hà Minh Công, Phạm Tường ChiếnVietnam Open Infrastructure User Group

The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...Databricks

YugaByte DB Internals - Storage Engine and Transactions Yugabyte

Machine Learning Data Lineage with MLflow and Delta LakeDatabricks

Cloudstack autoscalingShapeBlue

Server monitoring using grafana and prometheusCeline George

Introduction to Firebase from GoogleManikantan Krishnamurthy

추천, 개인화 그리고 물류 예측 - 어떻게 시작하고 무엇을 준비해야 하는가? - 김민성 솔루션즈 아키텍트, AWS / 경희정 부장, CJ대...Amazon Web Services Korea

OpenShift Meetup - Tokyo - Service Mesh and Serverless OverviewMaría Angélica Bracho

Monitoring using Prometheus and GrafanaArvind Kumar G.S

What's hot (20)

Building Microservices with gRPC and NATS

Achieving CI/CD with Kubernetes

Why to Cloud Native

MicroServices with Containers, Kubernetes & ServiceMesh

Google Firebase presentation - English

AWS Finance Symposium_바로 도입할 수 있는 금융권 업무의 클라우드 아키텍처 알아보기

Anthos

How to Manage Your Time Series Data Pipeline at the Edge with InfluxDB

Kafka + Uber- The World’s Realtime Transit Infrastructure, Aaron Schildkrout

Querying Linked Data

CI/CD trên Cloud OpenStack tại Viettel Networks | Hà Minh Công, Phạm Tường Chiến

The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...

YugaByte DB Internals - Storage Engine and Transactions

Machine Learning Data Lineage with MLflow and Delta Lake

Cloudstack autoscaling

Server monitoring using grafana and prometheus

Introduction to Firebase from Google

추천, 개인화 그리고 물류 예측 - 어떻게 시작하고 무엇을 준비해야 하는가? - 김민성 솔루션즈 아키텍트, AWS / 경희정 부장, CJ대...

OpenShift Meetup - Tokyo - Service Mesh and Serverless Overview

Monitoring using Prometheus and Grafana

Similar to Eventually Consistent Data Structures (from strangeloop12)

Eventually-Consistent Data StructuresSean Cribbs

Concurrent and Distributed Applications with Akka, Java and ScalaFernando Rodriguez

Introduction to Riak - Red Dirt Ruby Conf TrainingSean Cribbs

Introducing RiakKevin Smith

Storing and manipulating graphs in HBaseDan Lynn

HBaseCon 2012 | Storing and Manipulating Graphs in HBaseCloudera, Inc.

Archipelagosmsramanujan

Consistency without Consensus: CRDTs in Production at SoundCloudC4Media

Guaranteeing Consensus in Distriubuted Systems with CRDTsSun-Li Beatteay

Embrace NoSQL and Eventual Consistency with RippleSean Cribbs

Map reduce and the art of Thinking Parallel - Dr. Shailesh KumarHyderabad Scalability Meetup

Incremental View Maintenance for openCypher QueriesGábor Szárnyas

Incremental View Maintenance for openCypher QueriesopenCypher

Real World OptimizationDavid Golden

flowr streamlining computing workflowssahil seth

Regexp secretsHiro Asari

SVD and the Netflix DatasetBen Mabey

Scala + WattzOn, sitting in a tree....Raffi Krikorian

MLconf NYC Shan Shan HuangMLconf

Similar to Eventually Consistent Data Structures (from strangeloop12) (20)

Eventually-Consistent Data Structures

Concurrent and Distributed Applications with Akka, Java and Scala

Introduction to Riak - Red Dirt Ruby Conf Training

Introducing Riak

Storing and manipulating graphs in HBase

HBaseCon 2012 | Storing and Manipulating Graphs in HBase

Archipelagos

Consistency without Consensus: CRDTs in Production at SoundCloud

Guaranteeing Consensus in Distriubuted Systems with CRDTs

Embrace NoSQL and Eventual Consistency with Ripple

Map reduce and the art of Thinking Parallel - Dr. Shailesh Kumar

Incremental View Maintenance for openCypher Queries

Real World Optimization

flowr streamlining computing workflows

Regexp secrets

SVD and the Netflix Dataset

Scala + WattzOn, sitting in a tree....

MLconf NYC Shan Shan Huang

Recently uploaded

MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar

Landscape Catalogue 2024 Australia-1.pdfAarwolf Industries LLC

Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica

QCon London: Mastering long-running processes in modern architecturesBernd Ruecker

[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra

QMMS Lesson 2 - Using MS Excel Formula.pdfROWELL MARQUINA

Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765

A Glance At The Java Performance ToolboxAna-Maria Mihalceanu

Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Mark Simos

WomenInAutomation2024: AI and Automation for eveyoneUiPathCommunity

Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica

A Framework for Development in the AI AgeCprime

Generative AI - Gitex v1Generative AI - Gitex v1.pptxfnnc6jmgwh

How Tech Giants Cut Corners to Harvest Data for A.I.LUMINATIVE MEDIA/PROJECT COUNSEL MEDIA GROUP

React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech

Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...BookNet Canada

Connecting the Dots for Information Discovery.pdfNeo4j

Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll

JET Technology Labs White Paper for Virtualized Security and Encryption Techn...amber724300

Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein

Recently uploaded (20)

MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes

Landscape Catalogue 2024 Australia-1.pdf

Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...

QCon London: Mastering long-running processes in modern architectures

[Webinar] SpiraTest - Setting New Standards in Quality Assurance

QMMS Lesson 2 - Using MS Excel Formula.pdf

Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration

A Glance At The Java Performance Toolbox

Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)

WomenInAutomation2024: AI and Automation for eveyone

Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure

A Framework for Development in the AI Age

Generative AI - Gitex v1Generative AI - Gitex v1.pptx

How Tech Giants Cut Corners to Harvest Data for A.I.

React Native vs Ionic - The Best Mobile App Framework

Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...

Connecting the Dots for Information Discovery.pdf

Emixa Mendix Meetup 11 April 2024 about Mendix Native development

JET Technology Labs White Paper for Virtualized Security and Encryption Techn...

Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24

Eventually Consistent Data Structures (from strangeloop12)

1. Eventually- Consistent Data Structures Sean Cribbs @seancribbs #CRDT StrangeLoop 2012

2. I work for Basho We make

3. Riak is Eventually Consistent So are Voldemort and Cassandra

4. No ACID!

5. Duals or Duels?

6. Duals or Duels? object-oriented / functional

7. Duals or Duels? object-oriented / functional static / dynamic

8. Duals or Duels? object-oriented / functional static / dynamic consistency / availability

9. Duals or Duels? object-oriented / functional static / dynamic consistency / availability throughput / latency

10. Duals or Duels? object-oriented / functional static / dynamic consistency / availability throughput / latency threaded / evented

11. Duals or Duels? object-oriented / functional static / dynamic consistency / availability throughput / latency threaded / evented safety / liveness

12. Safety / Liveness Proving the Correctness of Multiprocess Programs - Leslie Lamport (March 1977)

13. Safety / Liveness Proving the Correctness of Multiprocess Programs - Leslie Lamport (March 1977) •Safety: “nothing bad happens” (partial correctness)

14. Safety / Liveness Proving the Correctness of Multiprocess Programs - Leslie Lamport (March 1977) •Safety: “nothing bad happens” (partial correctness) •Liveness: “something good eventually happens” (termination)

15. Safety / Liveness Proving the Correctness of Multiprocess Programs - Leslie Lamport (March 1977) •Safety: “nothing bad happens” (partial correctness) •Liveness: “something good eventually happens” (termination) “Safety and liveness: Eventual consistency is not safe” - Peter Bailis http://www.bailis.org/blog/safety-and-liveness-eventual-consistency-is- not-safe/

16. Eventual Consistency Replicated Loose coordination 3 Convergence

17. Eventual is Good ✔ Fault-tolerant ✔ Highly available ✔ Low-latency

18. Consistency? No clear winner! Throw one out? 3 Keep both? B

19. Consistency? No clear winner! Throw one out? 3 Keep both? B Cassandra

20. Consistency? No clear winner! Throw one out? 3 Keep both? B Cassandra Riak & Voldemort

21. Conﬂicts! A! B!

22. Semantic Resolution • Your app knows the domain - use business rules to resolve • Amazon Dynamo’s shopping cart

23. Semantic Resolution • Your app knows the domain - use business rules to resolve • Amazon Dynamo’s shopping cart “Ad hoc approaches have proven brittle and error-prone”

24. Conﬂict-Free Replicated Data Types

25. Conﬂict-Free Replicated Data Types useful abstractions

26. Conﬂict-Free Replicated Data Types multiple independent copies useful abstractions

27. resolves automatically toward a single value Conﬂict-Free Replicated Data Types multiple independent copies useful abstractions

28. http://db.cs.berkeley.edu/papers/UCB-lattice-tr.pdf

29. Bounded Join Semi-Lattices

30. Bounded Join Semi-Lattices 〈S, ⊔, ⊥〉

31. Bounded Join Semi-Lattices 〈S, ⊔, ⊥〉 ‣ S is a set

32. Bounded Join Semi-Lattices 〈S, ⊔, ⊥〉 ‣ S is a set ‣ ⊔ is a least-upper bound (join/merge) on S

33. Bounded Join Semi-Lattices 〈S, ⊔, ⊥〉 ‣ S is a set ‣ ⊔ is a least-upper bound (join/merge) on S ‣ ⊥∈S

34. Bounded Join Semi-Lattices 〈S, ⊔, ⊥〉 ‣ S is a set ‣ ⊔ is a least-upper bound (join/merge) on S ‣ ⊥∈S ‣ ∀x, y ∈ S: x ≤S y x⊔y=y

35. Bounded Join Semi-Lattices 〈S, ⊔, ⊥〉 ‣ S is a set ‣ ⊔ is a least-upper bound (join/merge) on S ‣ ⊥∈S ‣ ∀x, y ∈ S: x ≤S y x⊔y=y ‣ ∀x ∈ S: x ⊔ ⊥ = x

36. lmax Lattice S≔ℛ a ⊔b ≔ max(a,b) ⊥ ≔ -∞

37. lset Lattice {a,b,c,d,e} {a,b,c,d} {b,c,d,e} {b,c,d} Time {a,b,c} {c,d,e} {b,c} {c,d} {d,e} {a,b} {a} {b} {c} {d} {e}

38.

39. CRDT Flavors • Convergent: State • Weak messaging requirements •Commutative: Operations •Reliable broadcast required •Causal ordering sufficient

40. Convergent CRDTs

41. Commutative CRDTs

42. Registers A place to put your stuff

43. Registers • Last-Write Wins (LWW-Register) • e.g. Columns in Cassandra • Multi-Valued (MV-Register) • e.g. Objects (values) in Riak

44. Counters Keeping tabs

45. G-Counter

46. G-Counter // Starts empty []

47. G-Counter // Starts empty [] // A increments twice, forwarding state [{a,1}] // == 1 [{a,2}] // == 2

48. G-Counter // Starts empty [] // A increments twice, forwarding state [{a,1}] // == 1 [{a,2}] // == 2 // B increments [{b,1}] // == 1

49. G-Counter // Starts empty [] // A increments twice, forwarding state [{a,1}] // == 1 [{a,2}] // == 2 // B increments [{b,1}] // == 1 // Merging [{a,2}, {b,1}] [{a,1}, {b,1}]

50. PN-Counter // A PN-Counter { P = [{a,10},{b,2}], N = [{a,1},{c,5}] } // == (10+2)-(1+5) == 12-6 == 6

51. Sets Members Only

52. G-Set

53. G-Set // Starts empty {}

54. G-Set // Starts empty {} // A adds a and b, forwarding state {a} {a,b}

55. G-Set // Starts empty {} // A adds a and b, forwarding state {a} {a,b} // B adds c {c}

56. G-Set // Starts empty {} // A adds a and b, forwarding state {a} {a,b} // B adds c {c} // Merging {a,b,c} {a,c}

57. 2P-Set

58. 2P-Set // Starts empty {A={},R={}}

59. 2P-Set // Starts empty {A={},R={}} // A adds a and b, forwarding state, // removes a {A={a}, R={}} // == {a} {A={a,b},R={}} // == {a,b} {A={a,b},R={a}} // == {b}

60. 2P-Set // Starts empty {A={},R={}} // A adds a and b, forwarding state, // removes a {A={a}, R={}} // == {a} {A={a,b},R={}} // == {a,b} {A={a,b},R={a}} // == {b} // B adds c {A={c},R={}} // == {c}

61. 2P-Set // Starts empty {A={},R={}} // A adds a and b, forwarding state, // removes a {A={a}, R={}} // == {a} {A={a,b},R={}} // == {a,b} {A={a,b},R={a}} // == {b} // B adds c {A={c},R={}} // == {c} // Merging {A={a,b,c},R={a}} {A={a,c}, R={}}

62. LWW-Element-Set

63. OR-Set

64. G = (V,E) Graphs E⊆V×V

65. G = (V,E) Graphs E⊆V×V

66. G = (V,E) Graphs E⊆V×V

67. Use-Cases • Social graph (OR-Set or a Graph) • Web page visits (G-Counter) • Shopping Cart (Modiﬁed OR-Set) • “Like” button (U-Set)

68. Challenges: GC • CRDTs are inefficient • Synchronization may be required

69. Challenges: Responsibility • Client • Erlang: mochi/statebox • Clojure: reiddraper/knockbox • Ruby: aphyr/meangirls, bkerley/ hanover • Server

70. Thanks

Editor's Notes

\n
\n
\n
There&#x2019;s no ACID! But don&#x2019;t worry, there&#x2019;s no need to be upset, despite what you may have heard.\n
I think the fear people have about giving up ACID is really just a tendency to see things in black and white, because subtlety is much harder to understand and accept. Everyday in the wider technical community and the Internet we are presented with binary choices which are often not really in conflict, but either orthogonal albeit related concepts, or simply different ends of a spectrum. We too often perceive a Hegelian dialectic when one doesn&#x2019;t exist (without the synthesis part!). \n\nAn important pair we need to understand, but is not frequently discussed outside of academia is safety and liveness.\n
I think the fear people have about giving up ACID is really just a tendency to see things in black and white, because subtlety is much harder to understand and accept. Everyday in the wider technical community and the Internet we are presented with binary choices which are often not really in conflict, but either orthogonal albeit related concepts, or simply different ends of a spectrum. We too often perceive a Hegelian dialectic when one doesn&#x2019;t exist (without the synthesis part!). \n\nAn important pair we need to understand, but is not frequently discussed outside of academia is safety and liveness.\n
I think the fear people have about giving up ACID is really just a tendency to see things in black and white, because subtlety is much harder to understand and accept. Everyday in the wider technical community and the Internet we are presented with binary choices which are often not really in conflict, but either orthogonal albeit related concepts, or simply different ends of a spectrum. We too often perceive a Hegelian dialectic when one doesn&#x2019;t exist (without the synthesis part!). \n\nAn important pair we need to understand, but is not frequently discussed outside of academia is safety and liveness.\n
I think the fear people have about giving up ACID is really just a tendency to see things in black and white, because subtlety is much harder to understand and accept. Everyday in the wider technical community and the Internet we are presented with binary choices which are often not really in conflict, but either orthogonal albeit related concepts, or simply different ends of a spectrum. We too often perceive a Hegelian dialectic when one doesn&#x2019;t exist (without the synthesis part!). \n\nAn important pair we need to understand, but is not frequently discussed outside of academia is safety and liveness.\n
I think the fear people have about giving up ACID is really just a tendency to see things in black and white, because subtlety is much harder to understand and accept. Everyday in the wider technical community and the Internet we are presented with binary choices which are often not really in conflict, but either orthogonal albeit related concepts, or simply different ends of a spectrum. We too often perceive a Hegelian dialectic when one doesn&#x2019;t exist (without the synthesis part!). \n\nAn important pair we need to understand, but is not frequently discussed outside of academia is safety and liveness.\n
I think the fear people have about giving up ACID is really just a tendency to see things in black and white, because subtlety is much harder to understand and accept. Everyday in the wider technical community and the Internet we are presented with binary choices which are often not really in conflict, but either orthogonal albeit related concepts, or simply different ends of a spectrum. We too often perceive a Hegelian dialectic when one doesn&#x2019;t exist (without the synthesis part!). \n\nAn important pair we need to understand, but is not frequently discussed outside of academia is safety and liveness.\n
Safety and Liveness were terms which were defined for concurrent programs in this 1977 paper by Leslie Lamport. Colloquially, safety means that in the course of running your program, &#x201C;nothing bad will happen&#x201D; and liveness means that &#x201C;something good will eventually happen&#x201D;. Both are desirable properties, but sometimes enforcing one property may cause you to give up the other. I thought Peter Bailis stated this eloquently in his recent blog post that Eventual Consistency is not safe by itself - but a trivially satisfiable liveness property. That is, it helps keep your system available, but doesn&#x2019;t make any guarantees about whether correct answers will be given at all. His larger point was that practical systems like Riak/Voldemort/Cassandra do make safety guarantees but tend not to state them. It&#x2019;s not all &#x201C;garbage&#x201D;.\n
Safety and Liveness were terms which were defined for concurrent programs in this 1977 paper by Leslie Lamport. Colloquially, safety means that in the course of running your program, &#x201C;nothing bad will happen&#x201D; and liveness means that &#x201C;something good will eventually happen&#x201D;. Both are desirable properties, but sometimes enforcing one property may cause you to give up the other. I thought Peter Bailis stated this eloquently in his recent blog post that Eventual Consistency is not safe by itself - but a trivially satisfiable liveness property. That is, it helps keep your system available, but doesn&#x2019;t make any guarantees about whether correct answers will be given at all. His larger point was that practical systems like Riak/Voldemort/Cassandra do make safety guarantees but tend not to state them. It&#x2019;s not all &#x201C;garbage&#x201D;.\n
Safety and Liveness were terms which were defined for concurrent programs in this 1977 paper by Leslie Lamport. Colloquially, safety means that in the course of running your program, &#x201C;nothing bad will happen&#x201D; and liveness means that &#x201C;something good will eventually happen&#x201D;. Both are desirable properties, but sometimes enforcing one property may cause you to give up the other. I thought Peter Bailis stated this eloquently in his recent blog post that Eventual Consistency is not safe by itself - but a trivially satisfiable liveness property. That is, it helps keep your system available, but doesn&#x2019;t make any guarantees about whether correct answers will be given at all. His larger point was that practical systems like Riak/Voldemort/Cassandra do make safety guarantees but tend not to state them. It&#x2019;s not all &#x201C;garbage&#x201D;.\n
In an eventually consistent system, you tend to have multiple copies of the same datum, which means that it&#x2019;s replicated. They also tend to allow loose coordination and things like sloppy quorums, since you don&#x2019;t require expensive multi-phase commit protocols. This also makes them resilient to network partitions, which DO EXIST. Eventually consistent systems must also include means for state to move forward when staleness is detected. In Dynamo-like systems, this is usually done with read-repair, that is, writing the newer value to stale replicas when reading.\n
While not as simple to understand as an ACID system, eventual consistency has many practical benefits. When encountering failures, especially network-related ones, the system can more often remain available to reads and writes despite the failures. In the same vein, relying on dynamic participation in operations lends itself to systems with low, consistent latency because only promptly-responding replicas need to be considered.\n
Of course the tradeoff of those benefits, thanks to the CAP theorem, is that you sacrifice strict consistency. There is no total ordering of events in the system, you have no transactions, you have weak guarantees of delivery at best. This means it&#x2019;s incredibly difficult to decide who wins when there are concurrent writes in the system. The solutions to the problem are both non-ideal, but they are generally: first, to throw one version out by applying an arbitrary ordering, usually a timestamp of sorts; second, to keep both values around and let the user decide. These are the approaches of Cassandra, and Riak/Voldemort respectively.\n
Of course the tradeoff of those benefits, thanks to the CAP theorem, is that you sacrifice strict consistency. There is no total ordering of events in the system, you have no transactions, you have weak guarantees of delivery at best. This means it&#x2019;s incredibly difficult to decide who wins when there are concurrent writes in the system. The solutions to the problem are both non-ideal, but they are generally: first, to throw one version out by applying an arbitrary ordering, usually a timestamp of sorts; second, to keep both values around and let the user decide. These are the approaches of Cassandra, and Riak/Voldemort respectively.\n
So maybe you chose Riak or Voldemort, you get write conflicts (Riak calls them siblings). Now that you&#x2019;ve got both values, how do you decide what the real state should be?\n
One strategy, which I call &#x201C;semantic resolution&#x201D;, is to say that your application encodes the domain of the problem and so it can use business rules to resolve the conflict. This is the strategy implemented by the &#x201C;shopping cart&#x201D; described in the Amazon Dynamo paper. It merges toward the maximum quantity of each item in the cart; however, it exhibits some problems -- namely that sometimes items that were removed from the cart can reappear! From Amazon&#x2019;s point of view this is okay because it might encourage the customer to buy more, but it is a bewildering user-experience!\n\nFortunately, there is some interesting recent research about a more rigorous approach to eventual consistency.\n\n\n
...and that is Conflict-Free Replicated Data Types. This basically means that instead of strictly opaque values, the datastore provides useful abstract data structures. Since we&#x2019;re in an eventually consistent system, the data structure is replicated to multiple locations, all of which act independently. But by far the most compelling part is that these data structures have the ability to resolve automatically toward a single value, given any number of conflicting values at individual replicas. CRDTs provide a strong safety property for eventually consistent systems that doesn&#x2019;t sacrifice liveness in the process.\n
...and that is Conflict-Free Replicated Data Types. This basically means that instead of strictly opaque values, the datastore provides useful abstract data structures. Since we&#x2019;re in an eventually consistent system, the data structure is replicated to multiple locations, all of which act independently. But by far the most compelling part is that these data structures have the ability to resolve automatically toward a single value, given any number of conflicting values at individual replicas. CRDTs provide a strong safety property for eventually consistent systems that doesn&#x2019;t sacrifice liveness in the process.\n
...and that is Conflict-Free Replicated Data Types. This basically means that instead of strictly opaque values, the datastore provides useful abstract data structures. Since we&#x2019;re in an eventually consistent system, the data structure is replicated to multiple locations, all of which act independently. But by far the most compelling part is that these data structures have the ability to resolve automatically toward a single value, given any number of conflicting values at individual replicas. CRDTs provide a strong safety property for eventually consistent systems that doesn&#x2019;t sacrifice liveness in the process.\n
The theory behind what I&#x2019;m going to talk about is the idea of bounded join semi-lattices, or &#x201C;lattices&#x201D; for short, and is rooted in the theory of monotonic logic. The definition I&#x2019;m giving here comes from a recent paper by Neil Conway and others at UC-Berkeley.\n
A lattice is a triple of a set, a function, and a value. S is a set (possibly infinite) representing the possible values of the lattice. The upside-down T is the &#x201C;least element&#x201D; of the set. The &#x201C;square U&#x201D; is a binary operator over S that produces a least-upper bound of its operands that is also a member of S, also called the &#x201C;join&#x201D; or &#x201C;merge&#x201D; operator. The merge operator is commutative, associative, and idempotent. Finally, a lattice has the property such that for any two members of the set S, the merge operator creates a partial ordering over the set. This also means that merging any element with the least element is an identity operation.\n
A lattice is a triple of a set, a function, and a value. S is a set (possibly infinite) representing the possible values of the lattice. The upside-down T is the &#x201C;least element&#x201D; of the set. The &#x201C;square U&#x201D; is a binary operator over S that produces a least-upper bound of its operands that is also a member of S, also called the &#x201C;join&#x201D; or &#x201C;merge&#x201D; operator. The merge operator is commutative, associative, and idempotent. Finally, a lattice has the property such that for any two members of the set S, the merge operator creates a partial ordering over the set. This also means that merging any element with the least element is an identity operation.\n
A lattice is a triple of a set, a function, and a value. S is a set (possibly infinite) representing the possible values of the lattice. The upside-down T is the &#x201C;least element&#x201D; of the set. The &#x201C;square U&#x201D; is a binary operator over S that produces a least-upper bound of its operands that is also a member of S, also called the &#x201C;join&#x201D; or &#x201C;merge&#x201D; operator. The merge operator is commutative, associative, and idempotent. Finally, a lattice has the property such that for any two members of the set S, the merge operator creates a partial ordering over the set. This also means that merging any element with the least element is an identity operation.\n
A lattice is a triple of a set, a function, and a value. S is a set (possibly infinite) representing the possible values of the lattice. The upside-down T is the &#x201C;least element&#x201D; of the set. The &#x201C;square U&#x201D; is a binary operator over S that produces a least-upper bound of its operands that is also a member of S, also called the &#x201C;join&#x201D; or &#x201C;merge&#x201D; operator. The merge operator is commutative, associative, and idempotent. Finally, a lattice has the property such that for any two members of the set S, the merge operator creates a partial ordering over the set. This also means that merging any element with the least element is an identity operation.\n
A lattice is a triple of a set, a function, and a value. S is a set (possibly infinite) representing the possible values of the lattice. The upside-down T is the &#x201C;least element&#x201D; of the set. The &#x201C;square U&#x201D; is a binary operator over S that produces a least-upper bound of its operands that is also a member of S, also called the &#x201C;join&#x201D; or &#x201C;merge&#x201D; operator. The merge operator is commutative, associative, and idempotent. Finally, a lattice has the property such that for any two members of the set S, the merge operator creates a partial ordering over the set. This also means that merging any element with the least element is an identity operation.\n
A lattice is a triple of a set, a function, and a value. S is a set (possibly infinite) representing the possible values of the lattice. The upside-down T is the &#x201C;least element&#x201D; of the set. The &#x201C;square U&#x201D; is a binary operator over S that produces a least-upper bound of its operands that is also a member of S, also called the &#x201C;join&#x201D; or &#x201C;merge&#x201D; operator. The merge operator is commutative, associative, and idempotent. Finally, a lattice has the property such that for any two members of the set S, the merge operator creates a partial ordering over the set. This also means that merging any element with the least element is an identity operation.\n
Just for the sake of illustration, let&#x2019;s look at one of the simpler lattices defined in Conway&#x2019;s paper, the &#x201C;lmax&#x201D; lattice. The set of values in the lattice are the Real numbers. The merge function is defined as taking the maximum of the two values. The minimum value is negative infinity. I hope you can see that this definition is a lattice: nothing is less than negative infinity, and the merging of any two values trends toward positive infinity, without exceeding the seen values.\n
Let&#x2019;s take another example for those who might be visual learners, the lset lattice. The set of values for the lattice are all simple sets, with the empty set being the minimum value. The merge function is set-union, which you should be able to see in this diagram, allow any ordering of operation delivery to eventually converge on the same value. This diagram doesn&#x2019;t even show all of the possible orderings, in fact.\n\nNow why is this stuff important? Remember how we had conflicts and we needed a sane way to resolve those conflicts? Lattices are a generic type that give us determinism in how we merge our conflicts. In the case of the &#x201C;lmax&#x201D; lattice, if one value has 10 and another has 15, you pick 15 because it&#x2019;s the larger one. This foundation gives us what we need to understand a larger study of the topic of conflict-resolution in eventual consistency.\n
The primary work on this research has been done by two researchers at INRIA and their colleagues in Portugal. Marc Shapiro also gave a great talk on the subject at Microsoft Research called &#x201C;Strong Eventual Consistency&#x201D; which you can easily find online.\n\nThe paper above is where I&#x2019;ve gotten most of the content and diagrams, but I&#x2019;ve tried to simplify the content so that we can get through it in the scope of this talk. If you want the real thing, search for <title>, it&#x2019;s free to download.\n
There are two flavors of CRDTs as you might have noticed. They both provide the same conflict-free property, but differ in their implementation strategy.\n\nConvergent types are based on a local modification of state, followed by forwarding the resulting state downstream, where a merge operation is performed at other replicas. The state itself encodes all information needed to converge. They are great for systems with weak message delivery guarantees - for example, a Dynamo-style system. Convergent types can also be resolved in clients, which is helpful for systems that do not provide rich datatypes.\n\nCommutative types, on the other hand, replicate commutative operations rather than state, and tend to rely on systems with reliable broadcast (that assures operations reach all replicas). Operations are generally not required to have a total ordering -- a local causal ordering is sufficient.\n
This diagram from the paper shows the basic format of a convergent, state based CRDT. Note how the mutation is applied locally, then forwarded downstream as a merge operation. As long as all replicas eventually receive states that include all mutations, they will converge on the same value. (The merge function is basically the merge function in a lattice.)\n
Again, in Commutative types forward operations to other replicas, not the state. Obviously, if an operation is not delivered, or applied out-of-order locally, the states don&#x2019;t converge. However, again, unlike the convergent type, a reliable broadcast channel is required. As long as functions f() and g() commute, state will converge.\n
A register is the simplest type of data structure - a memory cell storing an opaque value. It only supports two operations - &#x201C;assign&#x201D; and &#x201C;value&#x201D; (get and set). Concurrent updates will not commute (who should win?). We&#x2019;ve seen this problem before.\n
The two approaches to concurrent resolution are the same ones taken by Cassandra and Riak, respectively. That is, Last-Write-Wins (called an LWW-Register) and Multi-Valued (called MV-Register)-- keeping all divergent values. For resolution, LWW tend to use timestamps with a reasonable guarantee of ordering (which is difficult in practice, but in some systems sufficient). MV on the other hand, requires the more expensive version vector to resolve conflicts and produces the union of all divergent values (but it doesn&#x2019;t behave like a set!)\n
Counters are simply integers that are replicated and support the increment and decrement operations. Counters are useful for things like tracking the number of logged-in users, or click-throughs on an advertisement.\n\nThe simplest type of counter is a Commutative or operation-based type, since add and subtract are commutative, any delivery order is sufficient (ignoring over-/under-flow). The state-based counters are more interesting so we&#x2019;ll look at those.\n
A G-Counter only counts up and is basically a version vector (vector clock). Each replica increments its own pair only, the value is computed by summing the count of all replicas. Convergence is achieved by taking the maximum count for each replica. This is basically the Cassandra counters implementation.\n
A G-Counter only counts up and is basically a version vector (vector clock). Each replica increments its own pair only, the value is computed by summing the count of all replicas. Convergence is achieved by taking the maximum count for each replica. This is basically the Cassandra counters implementation.\n
A G-Counter only counts up and is basically a version vector (vector clock). Each replica increments its own pair only, the value is computed by summing the count of all replicas. Convergence is achieved by taking the maximum count for each replica. This is basically the Cassandra counters implementation.\n
A G-Counter only counts up and is basically a version vector (vector clock). Each replica increments its own pair only, the value is computed by summing the count of all replicas. Convergence is achieved by taking the maximum count for each replica. This is basically the Cassandra counters implementation.\n
A G-Counter only counts up and is basically a version vector (vector clock). Each replica increments its own pair only, the value is computed by summing the count of all replicas. Convergence is achieved by taking the maximum count for each replica. This is basically the Cassandra counters implementation.\n
A G-Counter only counts up and is basically a version vector (vector clock). Each replica increments its own pair only, the value is computed by summing the count of all replicas. Convergence is achieved by taking the maximum count for each replica. This is basically the Cassandra counters implementation.\n
A G-Counter only counts up and is basically a version vector (vector clock). Each replica increments its own pair only, the value is computed by summing the count of all replicas. Convergence is achieved by taking the maximum count for each replica. This is basically the Cassandra counters implementation.\n
A G-Counter only counts up and is basically a version vector (vector clock). Each replica increments its own pair only, the value is computed by summing the count of all replicas. Convergence is achieved by taking the maximum count for each replica. This is basically the Cassandra counters implementation.\n
A G-Counter only counts up and is basically a version vector (vector clock). Each replica increments its own pair only, the value is computed by summing the count of all replicas. Convergence is achieved by taking the maximum count for each replica. This is basically the Cassandra counters implementation.\n
A G-Counter only counts up and is basically a version vector (vector clock). Each replica increments its own pair only, the value is computed by summing the count of all replicas. Convergence is achieved by taking the maximum count for each replica. This is basically the Cassandra counters implementation.\n
A G-Counter only counts up and is basically a version vector (vector clock). Each replica increments its own pair only, the value is computed by summing the count of all replicas. Convergence is achieved by taking the maximum count for each replica. This is basically the Cassandra counters implementation.\n
A G-Counter only counts up and is basically a version vector (vector clock). Each replica increments its own pair only, the value is computed by summing the count of all replicas. Convergence is achieved by taking the maximum count for each replica. This is basically the Cassandra counters implementation.\n
A G-Counter only counts up and is basically a version vector (vector clock). Each replica increments its own pair only, the value is computed by summing the count of all replicas. Convergence is achieved by taking the maximum count for each replica. This is basically the Cassandra counters implementation.\n
A G-Counter only counts up and is basically a version vector (vector clock). Each replica increments its own pair only, the value is computed by summing the count of all replicas. Convergence is achieved by taking the maximum count for each replica. This is basically the Cassandra counters implementation.\n
PN-Counter - composed of two G-Counters - P for increments and N for decrements. The value is the difference between the values of the two G-Counters. The resolution is the pairwise resolution of the P and N counters.\n
Sets constitute one of the most basic data structures. Containers, Maps, and Graphs are all based on Sets. There are two operations, add and remove.\n
Like a G-Counter, a G-Set only grows in size. That is, it doesn&#x2019;t allow removal - its merge operation is a simple set-union, returning the maximal grouping without duplicates. Since add commutes with union, a G-Set can also be implemented as a commutative type. However, it&#x2019;s not an incredibly useful data-type on its own, but it can be part of another data structure.\n
Like a G-Counter, a G-Set only grows in size. That is, it doesn&#x2019;t allow removal - its merge operation is a simple set-union, returning the maximal grouping without duplicates. Since add commutes with union, a G-Set can also be implemented as a commutative type. However, it&#x2019;s not an incredibly useful data-type on its own, but it can be part of another data structure.\n
Like a G-Counter, a G-Set only grows in size. That is, it doesn&#x2019;t allow removal - its merge operation is a simple set-union, returning the maximal grouping without duplicates. Since add commutes with union, a G-Set can also be implemented as a commutative type. However, it&#x2019;s not an incredibly useful data-type on its own, but it can be part of another data structure.\n
Like a G-Counter, a G-Set only grows in size. That is, it doesn&#x2019;t allow removal - its merge operation is a simple set-union, returning the maximal grouping without duplicates. Since add commutes with union, a G-Set can also be implemented as a commutative type. However, it&#x2019;s not an incredibly useful data-type on its own, but it can be part of another data structure.\n
Like a G-Counter, a G-Set only grows in size. That is, it doesn&#x2019;t allow removal - its merge operation is a simple set-union, returning the maximal grouping without duplicates. Since add commutes with union, a G-Set can also be implemented as a commutative type. However, it&#x2019;s not an incredibly useful data-type on its own, but it can be part of another data structure.\n
Like a G-Counter, a G-Set only grows in size. That is, it doesn&#x2019;t allow removal - its merge operation is a simple set-union, returning the maximal grouping without duplicates. Since add commutes with union, a G-Set can also be implemented as a commutative type. However, it&#x2019;s not an incredibly useful data-type on its own, but it can be part of another data structure.\n
Like a G-Counter, a G-Set only grows in size. That is, it doesn&#x2019;t allow removal - its merge operation is a simple set-union, returning the maximal grouping without duplicates. Since add commutes with union, a G-Set can also be implemented as a commutative type. However, it&#x2019;s not an incredibly useful data-type on its own, but it can be part of another data structure.\n
Like a G-Counter, a G-Set only grows in size. That is, it doesn&#x2019;t allow removal - its merge operation is a simple set-union, returning the maximal grouping without duplicates. Since add commutes with union, a G-Set can also be implemented as a commutative type. However, it&#x2019;s not an incredibly useful data-type on its own, but it can be part of another data structure.\n
Like a G-Counter, a G-Set only grows in size. That is, it doesn&#x2019;t allow removal - its merge operation is a simple set-union, returning the maximal grouping without duplicates. Since add commutes with union, a G-Set can also be implemented as a commutative type. However, it&#x2019;s not an incredibly useful data-type on its own, but it can be part of another data structure.\n
Like a G-Counter, a G-Set only grows in size. That is, it doesn&#x2019;t allow removal - its merge operation is a simple set-union, returning the maximal grouping without duplicates. Since add commutes with union, a G-Set can also be implemented as a commutative type. However, it&#x2019;s not an incredibly useful data-type on its own, but it can be part of another data structure.\n
Like a G-Counter, a G-Set only grows in size. That is, it doesn&#x2019;t allow removal - its merge operation is a simple set-union, returning the maximal grouping without duplicates. Since add commutes with union, a G-Set can also be implemented as a commutative type. However, it&#x2019;s not an incredibly useful data-type on its own, but it can be part of another data structure.\n
Like a G-Counter, a G-Set only grows in size. That is, it doesn&#x2019;t allow removal - its merge operation is a simple set-union, returning the maximal grouping without duplicates. Since add commutes with union, a G-Set can also be implemented as a commutative type. However, it&#x2019;s not an incredibly useful data-type on its own, but it can be part of another data structure.\n
Like a G-Counter, a G-Set only grows in size. That is, it doesn&#x2019;t allow removal - its merge operation is a simple set-union, returning the maximal grouping without duplicates. Since add commutes with union, a G-Set can also be implemented as a commutative type. However, it&#x2019;s not an incredibly useful data-type on its own, but it can be part of another data structure.\n
Like a G-Counter, a G-Set only grows in size. That is, it doesn&#x2019;t allow removal - its merge operation is a simple set-union, returning the maximal grouping without duplicates. Since add commutes with union, a G-Set can also be implemented as a commutative type. However, it&#x2019;s not an incredibly useful data-type on its own, but it can be part of another data structure.\n
The second type of Set is a two-phase set, where a removed set member cannot be re-added. It is basically two G-Sets, one for add and one for remove. The removal set is sometimes called a tombstone set. To prevent spurious states (e.g. remove-before-add, making add have no effect), it has a precondition for remove that the local state must already contain the member.\n\nA special case of the 2P-Set is the U-Set. If the system can reasonably guarantee uniqueness, that is, the element will never be added again after removal, then the tombstone set is unnecessary. Uniqueness could be satisfied with a Lamport clock or suitably large RNG space.\n
The second type of Set is a two-phase set, where a removed set member cannot be re-added. It is basically two G-Sets, one for add and one for remove. The removal set is sometimes called a tombstone set. To prevent spurious states (e.g. remove-before-add, making add have no effect), it has a precondition for remove that the local state must already contain the member.\n\nA special case of the 2P-Set is the U-Set. If the system can reasonably guarantee uniqueness, that is, the element will never be added again after removal, then the tombstone set is unnecessary. Uniqueness could be satisfied with a Lamport clock or suitably large RNG space.\n
The second type of Set is a two-phase set, where a removed set member cannot be re-added. It is basically two G-Sets, one for add and one for remove. The removal set is sometimes called a tombstone set. To prevent spurious states (e.g. remove-before-add, making add have no effect), it has a precondition for remove that the local state must already contain the member.\n\nA special case of the 2P-Set is the U-Set. If the system can reasonably guarantee uniqueness, that is, the element will never be added again after removal, then the tombstone set is unnecessary. Uniqueness could be satisfied with a Lamport clock or suitably large RNG space.\n
The second type of Set is a two-phase set, where a removed set member cannot be re-added. It is basically two G-Sets, one for add and one for remove. The removal set is sometimes called a tombstone set. To prevent spurious states (e.g. remove-before-add, making add have no effect), it has a precondition for remove that the local state must already contain the member.\n\nA special case of the 2P-Set is the U-Set. If the system can reasonably guarantee uniqueness, that is, the element will never be added again after removal, then the tombstone set is unnecessary. Uniqueness could be satisfied with a Lamport clock or suitably large RNG space.\n
The second type of Set is a two-phase set, where a removed set member cannot be re-added. It is basically two G-Sets, one for add and one for remove. The removal set is sometimes called a tombstone set. To prevent spurious states (e.g. remove-before-add, making add have no effect), it has a precondition for remove that the local state must already contain the member.\n\nA special case of the 2P-Set is the U-Set. If the system can reasonably guarantee uniqueness, that is, the element will never be added again after removal, then the tombstone set is unnecessary. Uniqueness could be satisfied with a Lamport clock or suitably large RNG space.\n
The second type of Set is a two-phase set, where a removed set member cannot be re-added. It is basically two G-Sets, one for add and one for remove. The removal set is sometimes called a tombstone set. To prevent spurious states (e.g. remove-before-add, making add have no effect), it has a precondition for remove that the local state must already contain the member.\n\nA special case of the 2P-Set is the U-Set. If the system can reasonably guarantee uniqueness, that is, the element will never be added again after removal, then the tombstone set is unnecessary. Uniqueness could be satisfied with a Lamport clock or suitably large RNG space.\n
The second type of Set is a two-phase set, where a removed set member cannot be re-added. It is basically two G-Sets, one for add and one for remove. The removal set is sometimes called a tombstone set. To prevent spurious states (e.g. remove-before-add, making add have no effect), it has a precondition for remove that the local state must already contain the member.\n\nA special case of the 2P-Set is the U-Set. If the system can reasonably guarantee uniqueness, that is, the element will never be added again after removal, then the tombstone set is unnecessary. Uniqueness could be satisfied with a Lamport clock or suitably large RNG space.\n
The second type of Set is a two-phase set, where a removed set member cannot be re-added. It is basically two G-Sets, one for add and one for remove. The removal set is sometimes called a tombstone set. To prevent spurious states (e.g. remove-before-add, making add have no effect), it has a precondition for remove that the local state must already contain the member.\n\nA special case of the 2P-Set is the U-Set. If the system can reasonably guarantee uniqueness, that is, the element will never be added again after removal, then the tombstone set is unnecessary. Uniqueness could be satisfied with a Lamport clock or suitably large RNG space.\n
The second type of Set is a two-phase set, where a removed set member cannot be re-added. It is basically two G-Sets, one for add and one for remove. The removal set is sometimes called a tombstone set. To prevent spurious states (e.g. remove-before-add, making add have no effect), it has a precondition for remove that the local state must already contain the member.\n\nA special case of the 2P-Set is the U-Set. If the system can reasonably guarantee uniqueness, that is, the element will never be added again after removal, then the tombstone set is unnecessary. Uniqueness could be satisfied with a Lamport clock or suitably large RNG space.\n
The second type of Set is a two-phase set, where a removed set member cannot be re-added. It is basically two G-Sets, one for add and one for remove. The removal set is sometimes called a tombstone set. To prevent spurious states (e.g. remove-before-add, making add have no effect), it has a precondition for remove that the local state must already contain the member.\n\nA special case of the 2P-Set is the U-Set. If the system can reasonably guarantee uniqueness, that is, the element will never be added again after removal, then the tombstone set is unnecessary. Uniqueness could be satisfied with a Lamport clock or suitably large RNG space.\n
The second type of Set is a two-phase set, where a removed set member cannot be re-added. It is basically two G-Sets, one for add and one for remove. The removal set is sometimes called a tombstone set. To prevent spurious states (e.g. remove-before-add, making add have no effect), it has a precondition for remove that the local state must already contain the member.\n\nA special case of the 2P-Set is the U-Set. If the system can reasonably guarantee uniqueness, that is, the element will never be added again after removal, then the tombstone set is unnecessary. Uniqueness could be satisfied with a Lamport clock or suitably large RNG space.\n
The second type of Set is a two-phase set, where a removed set member cannot be re-added. It is basically two G-Sets, one for add and one for remove. The removal set is sometimes called a tombstone set. To prevent spurious states (e.g. remove-before-add, making add have no effect), it has a precondition for remove that the local state must already contain the member.\n\nA special case of the 2P-Set is the U-Set. If the system can reasonably guarantee uniqueness, that is, the element will never be added again after removal, then the tombstone set is unnecessary. Uniqueness could be satisfied with a Lamport clock or suitably large RNG space.\n
The second type of Set is a two-phase set, where a removed set member cannot be re-added. It is basically two G-Sets, one for add and one for remove. The removal set is sometimes called a tombstone set. To prevent spurious states (e.g. remove-before-add, making add have no effect), it has a precondition for remove that the local state must already contain the member.\n\nA special case of the 2P-Set is the U-Set. If the system can reasonably guarantee uniqueness, that is, the element will never be added again after removal, then the tombstone set is unnecessary. Uniqueness could be satisfied with a Lamport clock or suitably large RNG space.\n
The second type of Set is a two-phase set, where a removed set member cannot be re-added. It is basically two G-Sets, one for add and one for remove. The removal set is sometimes called a tombstone set. To prevent spurious states (e.g. remove-before-add, making add have no effect), it has a precondition for remove that the local state must already contain the member.\n\nA special case of the 2P-Set is the U-Set. If the system can reasonably guarantee uniqueness, that is, the element will never be added again after removal, then the tombstone set is unnecessary. Uniqueness could be satisfied with a Lamport clock or suitably large RNG space.\n
The second type of Set is a two-phase set, where a removed set member cannot be re-added. It is basically two G-Sets, one for add and one for remove. The removal set is sometimes called a tombstone set. To prevent spurious states (e.g. remove-before-add, making add have no effect), it has a precondition for remove that the local state must already contain the member.\n\nA special case of the 2P-Set is the U-Set. If the system can reasonably guarantee uniqueness, that is, the element will never be added again after removal, then the tombstone set is unnecessary. Uniqueness could be satisfied with a Lamport clock or suitably large RNG space.\n
The second type of Set is a two-phase set, where a removed set member cannot be re-added. It is basically two G-Sets, one for add and one for remove. The removal set is sometimes called a tombstone set. To prevent spurious states (e.g. remove-before-add, making add have no effect), it has a precondition for remove that the local state must already contain the member.\n\nA special case of the 2P-Set is the U-Set. If the system can reasonably guarantee uniqueness, that is, the element will never be added again after removal, then the tombstone set is unnecessary. Uniqueness could be satisfied with a Lamport clock or suitably large RNG space.\n
Tag each element in A and R with timestamp. Greatest timestamp wins out for each individual element. Could be implemented with Cassandra super-columns.\n\nFigure 12: LWW-element-Set; elements masked by one with a higher timestamp are elided (state-based)\n\n
Tag each added element uniquely (without exposing them). When removing, remove all seen and forward operation downstream with tags. State-based version would be based on U-Set.\n\n
You might notice we&#x2019;re going up in complexity here in terms of the types of data-structures. Graphs are incredibly useful for many problems, but also have a bunch of potential anomalies within them - concurrent add/removes of vertices and edges may not converge - that is, global invariants can&#x2019;t be guaranteed. For example, in the case of a DAG or linked-list where elements can be removed or added concurrently. Some anomalies may be removed via restricting the semantics, for example, making a graph add-only. I&#x2019;m not going to go into detail about how Graphs are implemented, but a simple one is the 2P2P graph, based on a pair of 2P-sets, one for vertices and one for edges. In the case where a vertex is removed, the most reliable (and intuitive) solution is to remove all attached edges, thus a 2P-Set paradigm works well for the components of a generic graph.\n\n\n
You might notice we&#x2019;re going up in complexity here in terms of the types of data-structures. Graphs are incredibly useful for many problems, but also have a bunch of potential anomalies within them - concurrent add/removes of vertices and edges may not converge - that is, global invariants can&#x2019;t be guaranteed. For example, in the case of a DAG or linked-list where elements can be removed or added concurrently. Some anomalies may be removed via restricting the semantics, for example, making a graph add-only. I&#x2019;m not going to go into detail about how Graphs are implemented, but a simple one is the 2P2P graph, based on a pair of 2P-sets, one for vertices and one for edges. In the case where a vertex is removed, the most reliable (and intuitive) solution is to remove all attached edges, thus a 2P-Set paradigm works well for the components of a generic graph.\n\n\n
You might notice we&#x2019;re going up in complexity here in terms of the types of data-structures. Graphs are incredibly useful for many problems, but also have a bunch of potential anomalies within them - concurrent add/removes of vertices and edges may not converge - that is, global invariants can&#x2019;t be guaranteed. For example, in the case of a DAG or linked-list where elements can be removed or added concurrently. Some anomalies may be removed via restricting the semantics, for example, making a graph add-only. I&#x2019;m not going to go into detail about how Graphs are implemented, but a simple one is the 2P2P graph, based on a pair of 2P-sets, one for vertices and one for edges. In the case where a vertex is removed, the most reliable (and intuitive) solution is to remove all attached edges, thus a 2P-Set paradigm works well for the components of a generic graph.\n\n\n
You might notice we&#x2019;re going up in complexity here in terms of the types of data-structures. Graphs are incredibly useful for many problems, but also have a bunch of potential anomalies within them - concurrent add/removes of vertices and edges may not converge - that is, global invariants can&#x2019;t be guaranteed. For example, in the case of a DAG or linked-list where elements can be removed or added concurrently. Some anomalies may be removed via restricting the semantics, for example, making a graph add-only. I&#x2019;m not going to go into detail about how Graphs are implemented, but a simple one is the 2P2P graph, based on a pair of 2P-sets, one for vertices and one for edges. In the case where a vertex is removed, the most reliable (and intuitive) solution is to remove all attached edges, thus a 2P-Set paradigm works well for the components of a generic graph.\n\n\n
\n
CRDTs tend to create a lot of garbage: tombstones grow and internal structures become unbalanced. In general, garbage collection is extremely difficult to do without synchronization. Luckily, this doesn&#x2019;t impact correctness, only efficiency and performance.\n
Client - have to come up with a common representation across languages, allocation of actor IDs is problematic, can only use state-based CRDTs.\nServer - no one implements them yet, really (Cassandra&#x2019;s counter has some anomalies), but we&#x2019;re working hard to bring them to Riak.\n
\n

Eventually Consistent Data Structures (from strangeloop12)

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Eventually Consistent Data Structures (from strangeloop12)

Similar to Eventually Consistent Data Structures (from strangeloop12) (20)

More from Sean Cribbs

More from Sean Cribbs (16)

Recently uploaded

Recently uploaded (20)

Eventually Consistent Data Structures (from strangeloop12)

Editor's Notes