Many applications today are dependent on databases. Access to past states of database data enables new kinds of useful queries: time-traveling queries. With time travel, application developers can analyze and predict trends in changing data over time, detect data anomalies, and recover from user error such as accidental deletion of data (without relying on a cumbersome database restore). System administrators want simple and efficient backups. Database snapshots can bridge this gap and provide both, without disrupting performance.
This talk dives into snapshots as a database system service. We will discuss design choices for snapshots and time travel, and how those choices impact applications. You will learn about novel research results about how to add snapshots to a database system in a modular way, and we will touch on the challenges and opportunities that present when that database is distributed.
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
From Backups To Time Travel: A Systems Perspective on Snapshots
1. From Backups To Time Travel: A
Systems Perspective on Snapshots
Ross Shaull
2. Where it comes from
A Modular and Efficient Past State System for Berkeley DB.
Ross Shaull, Liuba Shrira, Barbara Liskov. USENIX 2014.
Retro: A Methodology for Retrospection Everywhere (PhD thesis).
Ross Shaull. 2013.
Skippy: a new snapshot indexing method for time travel in the storage manager.
Ross Shaull, Liuba Shrira, Hao Xu. SIGMOD 2008.
3. Talk plan
• Define snapshots and programming model
• Architecture
• Compare with other approaches
• Formal model for deriving snapshot protocols
• How snapshots are saved and accessed
• A peek at why snapshots are recoverable
• Challenges and opportunities in a distributed
system
– Speculation and pronouncement
4. Snapshots and Retrospection
• Past states of data can provide insights
– trend analysis
– anomaly and intrusion detection
• Auditing may require past-state retention
• Saving consistent past states (snapshots) is
challenging and not available in all data stores
• Retrospection = point in time query +
programming model
7. Programming Model
begin;
insert into accounts values(…);
update accounts
set balance=0 where name=‘Tom’;
snapshot;
commit;
select * from accounts
where name = ‘Tom’
as of S;
8. Backup
• Separate copy of data
– Usually stored off-site
• Think of it as a copy of a snapshot
– Can also be incremental of course
• Snapshots and backup are different, an in-situ
snapshot does not protect from catastrophic
storage loss, a backup does not enable
retrospection
11. Why this design for Retro?
• Logical-level snapshots require
significant modifications to the
data store
Application
Database
Interface
TSM
Snapshot layer
DB Disk Retro Disk
as of snap
Access methods / indexes
12. Why this design for Retro?
• Logical-level snapshots require
significant modifications to the
data store
• With low-level snapshots, it can
be expensive to get consistency
Application
Database
Interface
TSM
Snapshot layer
DB Disk Retro Disk
as of snap
Access methods / indexes
13. Why this design for Retro?
• Logical-level snapshots require
significant modifications to the
data store
• With low-level snapshots, it’s
expensive to get consistency
• Retro is not “too high” or “too
low”
– Simple integration and non-
disruptive
Application
Database
Interface
TSM
Snapshot layer
DB Disk Retro Disk
as of snap
Access methods / indexes
14. An interlude
• We’ll look at a snapshot strategy that depends
on a system layer below the database
– What are its strengths and weaknesses?
– Why implement inside the database?
• We’ll also quickly review some database
concepts
15. Atomic file system snapshot
• Below the level of database transactions
• Can provide crash consistent snapshots
• An alternative with limitations
• Before we talk about limitations, let’s look at
why it provides crash-consistency
16. Atomicity and Durability
• A and D in ACID
• Modern relational databases typically achieve
them with transactions which are logged to a
write-ahead log (WAL)
Visibility of this
object is which
letter in ACID?
Page cache
Journal Data
18. Atomic file system snapshot
• Strengths
– Simple: compose it from existing building blocks
• Weaknesses
– Snapshot consistent but may not be precise
– Must perform recovery before retrospection
– Requires file system support
20. Retro snapshots
• Retrospection as of snapshot S is equivalent to
a hypothetical current-state query that runs
immediately after S was declared
T1: update P1, P2, declare S2
T2: update P1
P1
P2
Database Retro
P1
P2
P1
P1
P2
P1
S1
S2
Transactions
21. Overwrite sequence (OWS)
• OWS(H) is a tagging of history H
– which page pre-states to save
– the snapshot pages a retrospective query accesses
– which pre-states and snapshot declarations to
recover
A Modular and Efficient Past State System for Berkeley DB.
Ross Shaull, Liuba Shrira, Barbara Liskov. USENIX 2014.
22. Protocol extension: MVCC
• Concurrent access to current state and
snapshots
• Efficient copying of snapshots
• Retrospection runs using MVCC and page
requests are redirected to snapshot pages that
have migrated to Retro disk
23. Retrospection (querying as of)
Snapshot Layer
Access methods / indexes
Page cache
Page name translation
Database disk Retro disk
Interface as of
Look up (DBFile, P)@S1
Get (DBFile, P)
Get (RetroFile, X)
@S1
24. Current state queries
Snapshot Layer
Access methods / indexes
Page cache
Page name translation
Database disk Retro disk
Interface
Get (DBFile, P)
25. Protocol extensions: Recovery
• Like the database, snapshots are written
asynchronously (non-disruptiveness)
• Retro saves pre-states during BDB recovery
– Snapshot declarations are also logged
• Identify needed pre-states using Retro
metadata during recovery
26. Protocol extension: Recovery
• Runtime invariants
– Snapshots are made durable first (WAS-invariant)
• Recovery-time extension
– Recover snapshot metadata first
– Idempotent
• we can tell if pre-state was saved already (using
snapshot metadata)
• only recreate pre-states not saved yet
27. Experimental evaluation
• Implemented as a set of callbacks
– About 250 lines of modifications to BDB source
– Call into about 5000 lines of snapshot layer code
• Retro is thread-safe
• Care taken to follow OWS(H) order in face of
concurrency
28. Experimental Results
• Database and snapshot data are written to one disk,
logs to the other
• Database size is 1 gb
• Snapshot store on Retro disk can be >100 gb
• Non-disruptiveness
– Random update workload with and without Retro
– With Retro, declare snapshot after every transaction
– Enforcing invariants for snapshot durability imposes
about 4% overhead on throughput
30. Now about that future…
• Distributed systems can support snapshots
for example…
– HDFS (distributed block store) has COW snapshots
– Cassandra can do EC backups
– Many systems must be quiesced first
• We want inexpensive snapshots with online,
in-situ retrospection?
31. Replicated systems, MVCC
• If there is correspondence between histories
on replicas, then a replica can save snapshots
– A replica saving snapshots eventually receives a
record of all transactions
• If a distributed system provides snapshot
isolation, MVCC can be exploited for
retrospection
– Exploit visibility mechanisms built into system to
determine which objects belong to each snapshot
32. We are here at NuoDB after all
TE TE
SM SM
Distributed durable
cache
36. Opportunities
• Adding the capability to store and query
snapshots is similar to adding capacity
• Multiple replicas can improve the
performance of saving and accessing
snapshots
• Snapshot workload can be physically
separated from current-state workload within
the same database
37. Conclusions
• Retro is a novel design for adding retrospection
– Supports powerful programming model
– Non-disruptive, long-lived snapshots
• Layered design key to simple design and
implementation
– Protocol extensions make it efficient, see paper
• Adding snapshots to a distributed system is a
natural extension with opportunities to improve
coexistence of current and past states