This document contains the slides from a presentation on data consistency. It discusses various consistency models including ACID, CAP theorem, and linearizability. It emphasizes that applications each have their own consistency rules and developers must understand where inconsistencies could occur with the tools and systems used to build the application. The presentation concludes by advising analyzing system properties and understanding how to minimize impacts of potential inconsistencies.
2. @ljacomet#DevoxxPL
• Louis Jacomet / @ljacomet
• Principal Software Engineer at Software AG / Terracotta since 2013
• A developer closer to his forties that did not fully manage to
dodge all things management
• Interests range from concurrency to API design, with learning new
things as a driving factor
• Part of the Devoxx family as program committee for Belgium
Who is that guy?
3. @ljacomet#DevoxxPL
• Been presenting on caching for a while now
• Focus usually on
• performance gains,
• ease of use,
• integration
• Mostly silent on consistency issues
• Distributed systems with or without micro services are really trendy
Why this talk?
4. @ljacomet#DevoxxPL
• Some tools sound like magic
• Makes for hard wake up calls when production disaster happen
• Building on the shoulder of giants does not mean you should
not look at the giant!
Why this talk?
6. @ljacomet#DevoxxPL
• Defines a model with a set of rules, and being consistent
means the rules are respected
• Examples:
• serial execution in a program thread
• or the Java memory model
Consistency?
8. @ljacomet#DevoxxPL
• From the database world:
“Consistency in database systems refers to the requirement
that any given database transaction must change affected
data only in allowed ways.Any data written to the database
must be valid according to all defined rules, including
constraints, cascades, triggers, and any combination thereof.”
https://en.wikipedia.org/wiki/Consistency_(database_systems)
Data Consistency and ACID
9. @ljacomet#DevoxxPL
• Isolation
• Concurrent transactions results in system state that would
be obtained if they were executed serially
• 4 levels of isolation, 3 read phenomena in ANSI SQL
• Consistency and Isolation are related properties
• Usually configurable
C and I
10. @ljacomet#DevoxxPL
Isolation levels vs Read phenomena
Isolation level Dirty reads
Non-repeatable
reads
Phantom reads
Read uncommitted X X X
Read committed X X
Repeatable read X
Serialisable
11. @ljacomet#DevoxxPL
• Configurable in your data source
• Frameworks may offer configuration
• When pooling connections, most often the option is one
isolation level for all
• Use multiple pools for multiple levels
• See Spring support for example
Isolation levels in Java
12. @ljacomet#DevoxxPL
• 4 different strategies
• Read only
• Non strict read write
• Read write
• Transactional
Hibernate Caching Strategies
13. @ljacomet#DevoxxPL
• Opens a window of inconsistency by using invalidation
• Cache entries are invalidated before and after transaction
completion
• Means that a concurrent transaction could end up loading an
outdated value during that time in the cache
Non strict read write
14. @ljacomet#DevoxxPL
• Resolves inconsistencies by using soft locks
• Cached items can only be read by transactions started after
the item’s creation
• Invalidated entries can only be replaced by a transaction with
a timestamp after the transaction that invalidated the mapping
Read write
15. @ljacomet#DevoxxPL
• Researchers have since then identified more phenomena and
thus defined more isolation levels
• Examples:
• read skew or write skew phenomena
• Snapshot or cursor stability isolation levels
Not the whole story …
19. @ljacomet#DevoxxPL
• Availability
“Availability means that every request to a non-failing
node must complete successfully. Since network
partitions are allowed to last arbitrarily long, this means that
nodes cannot simply defer responding until after the partition
heals.”
https://aphyr.com/posts/313-strong-consistency-models
CAP definitions
20. @ljacomet#DevoxxPL
• Partition (tolerance)
“Partition tolerance means that partitions can happen.
Providing consistency and availability when the network is reliable
is easy. Providing both when the network is not reliable is provably
impossible. If your network is not perfectly reliable–and it isn’t–you
cannot choose CA.This means that all practical distributed systems
on commodity hardware can guarantee, at maximum, either AP or
CP.”
https://aphyr.com/posts/313-strong-consistency-models
CAP definitions
21. @ljacomet#DevoxxPL
• (Atomic) Consistency
“Consistency means linearizability, and in particular, a
linearizable register. Registers are equivalent to other systems,
including sets, lists, maps, relational databases, and so on, so the
theorem can be extended to cover all kinds of linearizable
systems.”
https://aphyr.com/posts/313-strong-consistency-models
CAP definitions
22. @ljacomet#DevoxxPL
• Back to consistency - the term, not the definition
• Defines a model with a set of rules, and being consistent
means the rules are respected
Defining Linearizability
23. @ljacomet#DevoxxPL
• Operations span time
• Luckily, this time is finite
• From the beginning to the end of the operation
• Effect could be visible at any time during that span
• Let’s call that the linearisation point
Defining Linearizability
24. @ljacomet#DevoxxPL
• If there is a valid sequential history of operations using the
linearisation point, then linearizability is achieved
• Knowing that a response preceding an invocation must still
precede it in the reordering.
So what is Linearizability?
28. @ljacomet#DevoxxPL
• Powerful consequences:
• Completed operations must be visible
• Stale and non monotonic reads are prohibited
• Stackable model
• You can build higher level linearizability on top of
linearizability
So in practice?
34. @ljacomet#DevoxxPL
• Your application may never trigger these issues
• Not enough concurrency
• Higher consistency provided by the application logic
• Repair of inconsistencies are part of the business process
But why does it work then?
35. @ljacomet#DevoxxPL
• It probably cares about neither
• Instead it defines its own set of rules and must be consistent
with regards to those
What about your application?
36. @ljacomet#DevoxxPL
• An application is built of multiple pieces
• Storage, eventing, messaging
• Services, distributed or not
• UIs on different platforms with different partition
characteristics
Composing systems
38. @ljacomet#DevoxxPL
• For in-memory, the cache should always be consistent
• With write through, a failure to write means the entry is not
in the cache
• With write-behind, a failure to write will invalidate the cache
entry
Ehcache resilience strategy
39. @ljacomet#DevoxxPL
• What about distributed caches?
• Idea is to require users to provide their conflict resolution
strategy
Ehcache resilience strategy
40. @ljacomet#DevoxxPL
• Analyse the properties of the system
• Your application
• The tools it is built upon
• Understand where things can go wrong and what are the
consequences
• Then decide what to do and how to minimise impacts!
Conclusion
41. @ljacomet#DevoxxPL
• Aphyr and all things Jespen
• https://aphyr.com/posts
• Work from Peter Bailis
• http://www.bailis.org/blog/
• Adrian Colyer’s morning paper
• https://blog.acolyer.org/
• And more … shoulder of giants, remember?
References