Data consistency: Analyse, understand and decide

@ljacomet#DevoxxPL
Data Consistency
Analyze, understand and decide
Louis Jacomet
@ljacomet
Principal Software Engineer
Software AG / Terracotta
Platinum Sponsors:

@ljacomet#DevoxxPL
• Louis Jacomet / @ljacomet
• Principal Software Engineer at Software AG / Terracotta since 2013
• A developer closer to his forties that did not fully manage to
dodge all things management
• Interests range from concurrency to API design, with learning new
things as a driving factor
• Part of the Devoxx family as program committee for Belgium
Who is that guy?

@ljacomet#DevoxxPL
• Been presenting on caching for a while now
• Focus usually on
• performance gains,
• ease of use,
• integration
• Mostly silent on consistency issues
• Distributed systems with or without micro services are really trendy
Why this talk?

@ljacomet#DevoxxPL
• Some tools sound like magic
• Makes for hard wake up calls when production disaster happen
• Building on the shoulder of giants does not mean you should
not look at the giant!
Why this talk?

@ljacomet#DevoxxPL
• Consistency as in ACID
• Consistency as in CAP
• And what about your application?
Agenda

@ljacomet#DevoxxPL
• Deﬁnes a model with a set of rules, and being consistent
means the rules are respected
• Examples:
• serial execution in a program thread
• or the Java memory model
Consistency?

@ljacomet#DevoxxPL
• From the database world:
“Consistency in database systems refers to the requirement
that any given database transaction must change affected
data only in allowed ways.Any data written to the database
must be valid according to all deﬁned rules, including
constraints, cascades, triggers, and any combination thereof.”
https://en.wikipedia.org/wiki/Consistency_(database_systems)
Data Consistency and ACID

@ljacomet#DevoxxPL
• Isolation
• Concurrent transactions results in system state that would
be obtained if they were executed serially
• 4 levels of isolation, 3 read phenomena in ANSI SQL
• Consistency and Isolation are related properties
• Usually conﬁgurable
C and I

@ljacomet#DevoxxPL
Isolation levels vs Read phenomena
Isolation level Dirty reads
Non-repeatable
reads
Phantom reads
Read uncommitted X X X
Read committed X X
Repeatable read X
Serialisable

@ljacomet#DevoxxPL
• Conﬁgurable in your data source
• Frameworks may offer conﬁguration
• When pooling connections, most often the option is one
isolation level for all
• Use multiple pools for multiple levels
• See Spring support for example
Isolation levels in Java

@ljacomet#DevoxxPL
• 4 different strategies
• Read only
• Non strict read write
• Read write
• Transactional
Hibernate Caching Strategies

@ljacomet#DevoxxPL
• Opens a window of inconsistency by using invalidation
• Cache entries are invalidated before and after transaction
completion
• Means that a concurrent transaction could end up loading an
outdated value during that time in the cache
Non strict read write

@ljacomet#DevoxxPL
• Resolves inconsistencies by using soft locks
• Cached items can only be read by transactions started after
the item’s creation
• Invalidated entries can only be replaced by a transaction with
a timestamp after the transaction that invalidated the mapping
Read write

@ljacomet#DevoxxPL
• Researchers have since then identiﬁed more phenomena and
thus deﬁned more isolation levels
• Examples:
• read skew or write skew phenomena
• Snapshot or cursor stability isolation levels
Not the whole story …

@ljacomet#DevoxxPL
Isolation anomalies: Read skew
T1
T 2
T 1
T 2
x x
y y
read
50 25
write
50 75
write
commit
75
read
y = 75x = 50

@ljacomet#DevoxxPL
Isolation anomalies: Write skew
T1
T 2
T 1
T 2
x x
y y
read
30
10
write
commit
10
read
y = 10
x = 30
50
60
write
commit

@ljacomet#DevoxxPL
http://learnyousomeerlang.com/distribunomicon

@ljacomet#DevoxxPL
• Availability
“Availability means that every request to a non-failing
node must complete successfully. Since network
partitions are allowed to last arbitrarily long, this means that
nodes cannot simply defer responding until after the partition
heals.”
https://aphyr.com/posts/313-strong-consistency-models
CAP definitions

@ljacomet#DevoxxPL
• Partition (tolerance)
“Partition tolerance means that partitions can happen.
Providing consistency and availability when the network is reliable
is easy. Providing both when the network is not reliable is provably
impossible. If your network is not perfectly reliable–and it isn’t–you
cannot choose CA.This means that all practical distributed systems
on commodity hardware can guarantee, at maximum, either AP or
CP.”
CAP definitions

@ljacomet#DevoxxPL
• (Atomic) Consistency
“Consistency means linearizability, and in particular, a
linearizable register. Registers are equivalent to other systems,
including sets, lists, maps, relational databases, and so on, so the
theorem can be extended to cover all kinds of linearizable
systems.”
CAP definitions

@ljacomet#DevoxxPL
• Back to consistency - the term, not the deﬁnition
• Deﬁnes a model with a set of rules, and being consistent
means the rules are respected
Defining Linearizability

@ljacomet#DevoxxPL
• Operations span time
• Luckily, this time is ﬁnite
• From the beginning to the end of the operation
• Effect could be visible at any time during that span
• Let’s call that the linearisation point
Defining Linearizability

@ljacomet#DevoxxPL
• If there is a valid sequential history of operations using the
linearisation point, then linearizability is achieved
• Knowing that a response preceding an invocation must still
precede it in the reordering.
So what is Linearizability?

@ljacomet#DevoxxPL
A invokes lock B invokes lock A “fails” to lock B “gets” lock

@ljacomet#DevoxxPL
A invokes lock B invokes lockA “fails” to lock B “gets” lock

@ljacomet#DevoxxPL
A invokes lockB invokes lock A “fails” to lockB “gets” lock

@ljacomet#DevoxxPL
• Powerful consequences:
• Completed operations must be visible
• Stale and non monotonic reads are prohibited
• Stackable model
• You can build higher level linearizability on top of
linearizability
So in practice?

@ljacomet#DevoxxPL
• Attracted (public) attention to weaker consistency models
• By relaxing constraints, you can be C’A’P
Consequences of CAP

@ljacomet#DevoxxPL

@ljacomet#DevoxxPL
Cache
Terracotta
client
Terracotta
server
Terracotta
client
Terracotta
client

@ljacomet#DevoxxPL
http://hackingdistributed.com/2013/03/23/consistency-alphabet-soup/

@ljacomet#DevoxxPL
• Your application may never trigger these issues
• Not enough concurrency
• Higher consistency provided by the application logic
• Repair of inconsistencies are part of the business process
But why does it work then?

@ljacomet#DevoxxPL
• It probably cares about neither
• Instead it deﬁnes its own set of rules and must be consistent
with regards to those
What about your application?

@ljacomet#DevoxxPL
• An application is built of multiple pieces
• Storage, eventing, messaging
• Services, distributed or not
• UIs on different platforms with different partition
characteristics
Composing systems

@ljacomet#DevoxxPL
• Proposition:
“A cache should never be the cause of an application error”
Ehcache resilience strategy

@ljacomet#DevoxxPL
• For in-memory, the cache should always be consistent
• With write through, a failure to write means the entry is not
in the cache
• With write-behind, a failure to write will invalidate the cache
entry

@ljacomet#DevoxxPL
• What about distributed caches?
• Idea is to require users to provide their conﬂict resolution
strategy

@ljacomet#DevoxxPL
• Analyse the properties of the system
• Your application
• The tools it is built upon
• Understand where things can go wrong and what are the
consequences
• Then decide what to do and how to minimise impacts!
Conclusion

@ljacomet#DevoxxPL
• Aphyr and all things Jespen
• https://aphyr.com/posts
• Work from Peter Bailis
• http://www.bailis.org/blog/
• Adrian Colyer’s morning paper
• https://blog.acolyer.org/
• And more … shoulder of giants, remember?
References

@ljacomet#DevoxxPL
Q & A
Platinum Sponsors:

Data consistency: Analyse, understand and decide

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Data consistency: Analyse, understand and decide

Similar to Data consistency: Analyse, understand and decide (20)

More from Louis Jacomet

More from Louis Jacomet (8)

Recently uploaded

Recently uploaded (20)

Data consistency: Analyse, understand and decide