Zephyr: Live Migration in Shared Nothing Databases for Elastic Cloud Platforms

Aaron J. Elmore, Sudipto Das,
Divyakant Agrawal, Amr El Abbadi
Distributed Systems Lab
University of California Santa Barbara

 Serve thousands of applications (tenants)
◦ AppEngine, Azure, Force.com
 Tenants are (typically)
◦ Small
◦ SLA sensitive
◦ Erratic load patterns
◦ Subject to flash crowds
 i.e. the fark, digg, slashdot, reddit effect (for now)
 Support for Multitenancy is critical
 Our focus: DBMSs serving these platforms

Sudipto Das {sudipto@cs.ucsb.edu}

What the service
What the tenant wants… provider wants…


Static provisioning for peak is inelastic

Capacity

Resources
Resources

Capacity

Demand Demand
Time Time

Traditional Infrastructures Deployment in the Cloud

Unused resources
Slide Credits: Berkeley RAD Lab


Load Balancer

Application/
Web/Caching
tier

Database tier


 Migrate a tenant’s database in a Live
system
◦ A critical operation to support elasticity
 Different from
◦ Migration between software versions
◦ Migration in case of schema evolution


 VM migration [Clark et al., NSDI 2005]
 One tenant-per-VM
◦ Pros: allows fine-grained load balancing
◦ Cons
 Performance overhead
 Poor consolidation ratio [Curino et al., CIDR 2011]
 Multiple tenants in a VM
◦ Pros: good performance
◦ Cons: Migrate all tenants  Coarse-grained load
balancing


 Multiple tenants share the same
database process
◦ Shared process multitenancy
◦ Example systems: SQL Azure, ElasTraS, RelationalCloud,
and may more

 Migrate individual tenants
 VM migration cannot be used for fine-grained
migration
 Target architecture: Shared Nothing
◦ Shared storage architectures: see our VLDB 2011 Paper


 How to ensure no downtime?
 Need to migrate the persistent database image
(tens of MBs to GBs)
 How to guarantee correctness during
failures?
 Nodes can fail during migration
 How to ensure transaction atomicity and durability?
 How to recover migration state after failure?
 Nodes recover after a failure
 How to guarantee serializability?
 Transaction correctness equivalent to normal
operation
 How to minimize migration cost? …


 Downtime
◦ Time tenant is unavailable
 Service Interruption
◦ Number of operations failing/transactions aborting
 Migration Overhead/Performance
impact
◦ During normal operation, migration, and after
migration
 Additional Data Transferred
◦ Data transferred in addition to DB’s persistent image


 Migration executed in phases
 Starts with transfer of minimal information to destination
(“wireframe”)
 Source and destination concurrently execute
transactions in one migration phase
 Database pages used as granule of migration
 Pages “pulled” by destination on-demand
 Minimal transaction synchronization
 A page is uniquely owned by either source or destination
 Leverage page level locking
 Logging and handshaking protocols to
tolerate failures


 For this talk
◦ Small tenants
 i.e. not sharded across nodes.
◦ No replication
◦ No structural changes to indices
 Extensions in the paper
◦ Relaxes these assumptions


P1
P2
Owned Pages P3

Pn

Active transactions
TS1,…,
TSk
Source Destination
Page owned by Node

Page not owned by Node


Freeze index wireframe and migrate

P1 P1
P2 P2
Owned Pages P3 P3 Un-owned Pages

Pn Pn
TS1,…,
Active transactions
TSk
Source Destination
Page owned by Node



Source Destination


Requests for un-owned pages can block

P1 P3 accessed P1
P2 by TDi P2
P3 P3

P3 pulled
Pn from Pn
source
Old, still active TSk+1,… TD1,…, New transactions
transactions , TSl TDm
Source Destination
Page owned by Node
Index wireframes remain frozen


Pages can be pulled by the destination, if needed

P1 P1
P2 P2
P3 P3

P1, P2, …
pushed
Pn from source Pn

Completed
TDm+1,
…, TDn
Source Destination
Page owned by Node



Index wireframe un-frozen

P1
P2
P3

Pn
TDn+1,…
, TDp
Source Destination
Page owned by Node



 Once migrated, pages are never pulled
back by source
◦ Transactions at source accessing migrated pages are
aborted
 No structural changes to indices during
migration
◦ Transactions (at both nodes) that make structural
changes to indices abort
 Destination “pulls” pages on-demand
◦ Transactions at the destination experience higher
latency compared to normal operation


 Only concern is “dual mode”
◦ Init and Finish: only one node is executing transactions
 Local predicate locking of internal index
and exclusive page level locking
between nodes  no phantoms
 Strict 2PL  Transactions are locally
serializable
 Pages transferred only once
◦ No Tdest  Tsource conflict dependency
 Guaranteed serializability


 Transaction recovery
◦ For every database page, transactions at source
ordered before transactions at destination
◦ After failure, conflicting transactions replayed in
the same order
 Migration recovery
◦ Atomic transitions between migration modes
 Logging and handshake protocols
◦ Every page has exactly one owner
 Bookkeeping at the index level


 In the presence of arbitrary repeated
failures, Zephyr ensures:
◦ Updates made to database pages are consistent
◦ A failure does not leave a page without an owner
◦ Both source and destination are in the same
migration mode
 Guaranteed termination and
starvation freedom


 Replicated Tenants
 Sharded Tenants
 Allow structural changes to the indices
◦ Using shared lock managers in the dual mode


 Prototyped using an open source OLTP
database H2
◦ Supports standard SQL/JDBC API
◦ Serializable isolation level
◦ Tree Indices
◦ Relational data model
 Modified the database engine
◦ Added support for freezing indices
◦ Page migration status maintained using index
◦ Details in the paper…
 Tungsten SQL Router migrates JDBC
connections during migration


 Two database nodes, each with a DB
instance running
 Synthetic benchmark as load
generator
◦ Modified YCSB to add transactions
 Small read/write transactions
 Compared against Stop and Copy
(S&C)


Default transaction
parameters:
10 operations per
transaction 80% Read,
System 15% Update, 5% Inserts
Metadata
Controller
Workload: 60 sessions
100 Transactions per session
Migrate

Hardware: 2.4 Ghz Intel
Core 2 Quads, 8GB RAM,
7200 RPM SATA HDs with
32 MB Cache
Gigabit ethernet

Default DB Size: 100k rows
(~250 MB)

 Downtime (tenant unavailability)
◦ S&C: 3 – 8 seconds (needed to migrate,
unavailable for updates)
◦ Zephyr: No downtime. Either source or destination
is available
 Service interruption (failed operations)
◦ S&C: ~100 s – 1,000s. All transactions with updates
are aborted
◦ Zephyr: ~10s – 100s. Orders of magnitude less
interruption


 Average increase in transaction latency
(compared to the 6,000 transaction
workload without migration)
◦ S&C: 10 – 15%. Cold cache at destination
◦ Zephyr: 10 – 20%. Pages fetched on-demand
 Data transfer
◦ S&C: Persistent database image
◦ Zephyr: 2 – 3% additional data transfer (messaging
overhead)
 Total time taken to migrate
◦ S&C: 3 – 8 seconds. Unavailable for any writes
◦ Zephyr: 10 – 18 seconds. No-unavailability


Orders of
magnitude
fewer failed
operations


 Proposed Zephyr, a live database
migration technique with no downtime
for shared nothing architectures
◦ The first end to end solution with safety, correctness
and liveness guarantees
 Prototype implementation on a
relational OLTP database
 Low cost on a variety of workloads


Txns

Source Destination

Txns

Source Destination
Sudipto Das {sudipto@cs.ucsb.edu} 37

 Either source or destination is serving the
tenant
◦ No downtime
 Serializable transaction execution
◦ Unique page ownership
◦ Local multi-granularity locking
 Safety in the presence of failures
◦ Transactions are atomic and durable
◦ Migration state is recovered from log
 Ensure consistency of the database state


 Wireframe copy
 Typically orders of magnitude smaller than data
 Operational overhead during
migration
 Extra data (in addition to database pages)
transferred
 Transactions aborted during migration


Failures due to
attempted
modification of
Index structure


 Only committed
transaction
reported
 Loss of cache for
both migration
types
 Zephyr results in a
remote fetch


Zephyr: Live Migration in Shared Nothing Databases for Elastic Cloud Platforms

Recommended

Recommended

More Related Content

Similar to Zephyr: Live Migration in Shared Nothing Databases for Elastic Cloud Platforms

Similar to Zephyr: Live Migration in Shared Nothing Databases for Elastic Cloud Platforms (6)

Recently uploaded

Recently uploaded (20)

Zephyr: Live Migration in Shared Nothing Databases for Elastic Cloud Platforms

Editor's Notes