From Backups To Time Travel: A Systems Perspective on Snapshots

From Backups To Time Travel: A
Systems Perspective on Snapshots
Ross Shaull

Where it comes from
A Modular and Efficient Past State System for Berkeley DB.
Ross Shaull, Liuba Shrira, Barbara Liskov. USENIX 2014.
Retro: A Methodology for Retrospection Everywhere (PhD thesis).
Ross Shaull. 2013.
Skippy: a new snapshot indexing method for time travel in the storage manager.
Ross Shaull, Liuba Shrira, Hao Xu. SIGMOD 2008.

Talk plan
• Define snapshots and programming model
• Architecture
• Compare with other approaches
• Formal model for deriving snapshot protocols
• How snapshots are saved and accessed
• A peek at why snapshots are recoverable
• Challenges and opportunities in a distributed
system
– Speculation and pronouncement

Snapshots and Retrospection
• Past states of data can provide insights
– trend analysis
– anomaly and intrusion detection
• Auditing may require past-state retention
• Saving consistent past states (snapshots) is
challenging and not available in all data stores
• Retrospection = point in time query +
programming model

Snapshots are
• Consistent
• Global
• Named
• Application-declared
June 2014
Salary: $200K

Snapshots are
• Consistent
• Global
• Named
• Application-declared
June 2013
Salary: $120K
June 2013
Snapshot

Programming Model
begin;
insert into accounts values(…);
update accounts
set balance=0 where name=‘Tom’;
snapshot;
commit;
select * from accounts
where name = ‘Tom’
as of S;

Backup
• Separate copy of data
– Usually stored off-site
• Think of it as a copy of a snapshot
– Can also be incremental of course
• Snapshots and backup are different, an in-situ
snapshot does not protect from catastrophic
storage loss, a backup does not enable
retrospection

Architecture
Application
Database
Interface
Transactional storage manager
Snapshot layer
WAL Page cache MVCC
DB Disk Retro Disk
as of snap
Access methods / indexes
Snapshot layer
works at the page
level, inside the
TSM

Protocol extensions
Application
Database
Interface
Transactional storage manager
Snapshot layer
WAL Page cache MVCC
DB Disk Retro Disk
as of snap
Retro recovery
Retro page cache
• Snapshot page
translation
• Concurrency for
retrospection
• Efficient COW

Why this design for Retro?
• Logical-level snapshots require
significant modifications to the
data store
Application
Database
Interface
TSM
Snapshot layer
DB Disk Retro Disk
as of snap

data store
• With low-level snapshots, it can
be expensive to get consistency
Application
Database
Interface
TSM
Snapshot layer
DB Disk Retro Disk
as of snap

data store
• With low-level snapshots, it’s
expensive to get consistency
• Retro is not “too high” or “too
low”
– Simple integration and non-
disruptive
Application
Database
Interface
TSM
Snapshot layer
DB Disk Retro Disk
as of snap

An interlude
• We’ll look at a snapshot strategy that depends
on a system layer below the database
– What are its strengths and weaknesses?
– Why implement inside the database?
• We’ll also quickly review some database
concepts

Atomic file system snapshot
• Below the level of database transactions
• Can provide crash consistent snapshots
• An alternative with limitations
• Before we talk about limitations, let’s look at
why it provides crash-consistency

Atomicity and Durability
• A and D in ACID
• Modern relational databases typically achieve
them with transactions which are logged to a
write-ahead log (WAL)
Visibility of this
object is which
letter in ACID?
Page cache
Journal Data

Atomic file system snapshot
• Strengths
– Simple: compose it from existing building blocks
• Weaknesses
– Snapshot consistent but may not be precise
– Must perform recovery before retrospection
– Requires file system support

Retro snapshots, copy-on-write
T1: update P1, P2, declare S2
T2: update P1
P1
P2
Database Retro
P1
P2
P1
P1
P2
P1
S1
S2
Transactions

Retro snapshots
• Retrospection as of snapshot S is equivalent to
a hypothetical current-state query that runs
immediately after S was declared
T1: update P1, P2, declare S2
T2: update P1
P1
P2
Database Retro
P1
P2
P1
P1
P2
P1
S1
S2
Transactions

Overwrite sequence (OWS)
• OWS(H) is a tagging of history H
– which page pre-states to save
– the snapshot pages a retrospective query accesses
– which pre-states and snapshot declarations to
recover
A Modular and Efficient Past State System for Berkeley DB.
Ross Shaull, Liuba Shrira, Barbara Liskov. USENIX 2014.

Protocol extension: MVCC
• Concurrent access to current state and
snapshots
• Efficient copying of snapshots
• Retrospection runs using MVCC and page
requests are redirected to snapshot pages that
have migrated to Retro disk

Retrospection (querying as of)
Snapshot Layer
Page cache
Page name translation
Database disk Retro disk
Interface as of
Look up (DBFile, P)@S1
Get (DBFile, P)
Get (RetroFile, X)
@S1

Current state queries
Snapshot Layer
Page cache
Page name translation
Database disk Retro disk
Interface
Get (DBFile, P)

Protocol extensions: Recovery
• Like the database, snapshots are written
asynchronously (non-disruptiveness)
• Retro saves pre-states during BDB recovery
– Snapshot declarations are also logged
• Identify needed pre-states using Retro
metadata during recovery

Protocol extension: Recovery
• Runtime invariants
– Snapshots are made durable first (WAS-invariant)
• Recovery-time extension
– Recover snapshot metadata first
– Idempotent
• we can tell if pre-state was saved already (using
snapshot metadata)
• only recreate pre-states not saved yet

Experimental evaluation
• Implemented as a set of callbacks
– About 250 lines of modifications to BDB source
– Call into about 5000 lines of snapshot layer code
• Retro is thread-safe
• Care taken to follow OWS(H) order in face of
concurrency

Experimental Results
• Database and snapshot data are written to one disk,
logs to the other
• Database size is 1 gb
• Snapshot store on Retro disk can be >100 gb
• Non-disruptiveness
– Random update workload with and without Retro
– With Retro, declare snapshot after every transaction
– Enforcing invariants for snapshot durability imposes
about 4% overhead on throughput

Retrospection: I/O
build page table
query
1.0
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9
50/50
80/20
90/10
50/50
80/20
90/10
costintermsofQ
skew
100mb 1gb

Now about that future…
• Distributed systems can support snapshots
for example…
– HDFS (distributed block store) has COW snapshots
– Cassandra can do EC backups
– Many systems must be quiesced first
• We want inexpensive snapshots with online,
in-situ retrospection?

Replicated systems, MVCC
• If there is correspondence between histories
on replicas, then a replica can save snapshots
– A replica saving snapshots eventually receives a
record of all transactions
• If a distributed system provides snapshot
isolation, MVCC can be exploited for
retrospection
– Exploit visibility mechanisms built into system to
determine which objects belong to each snapshot

We are here at NuoDB after all
TE TE
SM SM
Distributed durable
cache

Let’s speculate
TE TE
SM SM
Transaction tier
Durability tier

Retrospection
Transaction tier
Durability tier
AS OF query
Miss that requires going to an SM

Retrospection
Transaction tier
Durability tier
AS OF query
Correct version of object AS OF snapshot
Visibility computed with MVCC

Opportunities
• Adding the capability to store and query
snapshots is similar to adding capacity
• Multiple replicas can improve the
performance of saving and accessing
snapshots
• Snapshot workload can be physically
separated from current-state workload within
the same database

Conclusions
• Retro is a novel design for adding retrospection
– Supports powerful programming model
– Non-disruptive, long-lived snapshots
• Layered design key to simple design and
implementation
– Protocol extensions make it efficient, see paper
• Adding snapshots to a distributed system is a
natural extension with opportunities to improve
coexistence of current and past states

Thank You!
Questions?
e: info@nuodb.com
t:@nuodb
www.nuodb.com

From Backups To Time Travel: A Systems Perspective on Snapshots

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (18)

Similar to From Backups To Time Travel: A Systems Perspective on Snapshots

Similar to From Backups To Time Travel: A Systems Perspective on Snapshots (20)

More from NuoDB

More from NuoDB (18)

Recently uploaded

Recently uploaded (20)

From Backups To Time Travel: A Systems Perspective on Snapshots