What Ever Happened to Durability?

What Ever Happened to
Durability?
Tom Lyon
Founder & Chief Scientist, DriveScale
@aka_pugs

Durability & ACID
§  Atomicity
§  Consistency
§  Isolation
§  Durability

Durability Defined
§  If written data is acknowledged, it must
be forever readable
§  If written data is read once [before it is
acknowledged], it must be forever
readable

Nothing is Forever
§  Hardware eventually fails
§  Software eventually (?) works
§  Durability is a matter of degree
§  What is good enough?

Estimating Durability
https://www.backblaze.com/blog/cloud-storage-durability/

Performance is the Enemy
§  “The only good write is an O_SYNC write”
§  Write-behind, caching, background
compaction/migration can all lead to
hidden errors
§  fsync(2) can and should return errors, but
misses some
§  See https://wiki.postgresql.org/wiki/Fsync_Errors
§  PostgreSQL: Caring about durability since 1986
§  “commit intervals”?

Can’t trust a File System
“We analyze 11 applications, and find 60 vulnerabilities,
some of which result in severe consequences like
corruption or data loss.”

Can’t trust an SSD
‘Surprisingly, we find that 13 out of the 15 devices,
including the supposedly “enterprise-class” devices,
exhibit failure behavior contrary to our expectations’

Servers and Mayflies
§  Back in the day, when “the” computer
crashed, you just waited for repair
§  Now you remove or re-image the server –
with the drives
§  Local durability is really hard,
but no longer adequate

Replication
§  Backups? Not timely
§  Synchronous mirroring? Very expensive
§  Just use the network! Make copies! Go
forth and replicate!
§  Losing a disk or server no longer causes
lost data. Right? Who needs fsync?

Correlated Failures
§  AWS can lose a data center, you can too
§  Rack power problems are common
§  The smaller your cluster, the more
vulnerable it is
https://xkcd.com/1737

CAP Theorem
§  You will have Partitioning.
§  You must choose between Availability
and Consistency.
§  Your users will hate your choice.
§  Availability can be improved by brute
force and $$$ - to reduce partitioning.
§  Consistency requires consensus.

Jepsen breaks everything
“Use Zookeeper. It’s mature, well-designed, and battle-tested.”
“The etcd and Consul teams both take consistency seriously…”
Kyle Kingsbury, https://jepsen.io

Logs & Journals
§  Application first writes to log, then to
where the data “really lives”
§  FS writes to journal, then to where the
data “really lives”
§  Device writes to log, then to where the
data “really lives”
§  What if “the truth” “really lived” in the log?
§  The other places become read caches

Table and Stream Duality
§  “A table is just a cache of the latest value
for each key in a stream” – P. Helland
§  Logs are great for streaming data
§  What if the log itself is distributed and
allows many writers and readers?

Streaming Systems
§  Apache Kafka
§  60 second “commit interval?”
§  Apache Pulsar
§  Uses Apache Bookkeeper
§  Distributed Logs:
§  Apache DistributedLog – uses Bookkeeper
§  Facebook LogDevice

Apache Bookkeeper™
§  “A scaleable, fault-tolerant, and low-
latency storage service optimized for
real-time workloads”
§  Guarantees:
§  “If an entry has been acknowledged, it must
be readable”
§  “If an entry has been read once, it must
always be readable”

Bookkeeper Components
§  Client-side library
§  Distributed Ledger Abstraction
§  “Bookie” – very simple storage nodes
§  Bookies do NOT talk to each other
§  Zookeeper coordination, consensus,
cluster membership, and quorums

Bookkeeper Data Flow
Bookies
Apps

Planet Java
§  Zookeeper and Bookkeeper are both
from planet Java
§  How about something more friendly to
Planet Linux?
§  Use etcd, rewrite Bookkeeper like
ScyllaDB did for Cassandra?

Take-aways
§  Durability is Hard
§  Distributed Durability is Very Hard
§  Be Up-Front about your durability model
§  Logs as Truth & Streaming are the future
§  Apache Bookkeeper is awesome
§  Don’t re-invent the wheel!

Q & A
Software Composable Infrastructure
for modern workloads
and commodity hardware.

What Ever Happened to Durability?

Recommended

Recommended

More Related Content

What's hot

What's hot (13)

Similar to What Ever Happened to Durability?

Similar to What Ever Happened to Durability? (20)

Recently uploaded

Recently uploaded (20)

What Ever Happened to Durability?