Introduction
System Design
Performance
Conclusions
MoSQL: An Elastic Storage Engine For MySQL
Alexander Tomic, Daniele Sciascia, Fernando Pedone
University of Lugano, Switzerland
March 20, 2013
ACM SAC 2013 - Dependable and Distributed Systems Track
1/17
Introduction
System Design
Performance
Conclusions
1 Introduction
2 System Design
MySQL Servers
Storage Nodes
Certifier
3 Performance
TPC-C
4 Conclusions
Future Work
Appendix: Similar Offerings to MoSQL
Appendix: B+Tree Details
2/17
Introduction
System Design
Performance
Conclusions
MySQL is a popular open-source RDBMS at the core of
many web-based applications (part of “LAMP” stack)
Typical approaches to scaling MySQL in the wild (e.g.
sharding, asynchronous replication) provide weak
guarantees and are inflexible1
Elasticity highly desirable in a cyclical world where
over-provisioning and energy costs are significant
Strong guarantees (serializability) make development much
easier
1
Though since original master’s thesis in Sept 2011 some commercial
offerings have attempted to remedy this. Details in appendix
3/17
Introduction
System Design
Performance
Conclusions
What do we define as “elastic”?
Add/remove servers to/from a running system
Ideally little performance impact
4/17
Introduction
System Design
Performance
Conclusions
What do we define as “elastic”?
Add/remove servers to/from a running system
Ideally little performance impact
Get Good Things like higher throughput, reduced latency,
increased system capacity
4/17
Introduction
System Design
Performance
Conclusions
SQL (90’s) -> NoSQL (00’s) -> NewSQL (10’s)
SQL transactions are great, but legacy RDBMS architectures
too slow and inflexible
“NoSQL” systems of various flavours attempted to fill the
void (Dynamo, BigTable, etc.), but pushed significant
complexity up to app. developers
Re-emergence of (semi-)relational model in contemporary
systems such as Spanner and Megastore (Google)
Ultimately, no panacea but the usual game of tradeoffs
5/17
Introduction
System Design
Performance
Conclusions
MySQL Servers
Storage Nodes
Certifier
Three Layer Architecture of MoSQL
6/17
Introduction
System Design
Performance
Conclusions
MySQL Servers
Storage Nodes
Certifier
MySQL Servers
MySQL has a storage engine
interface enabling different
storage strategies to be
implemented
Serves as a translator from SQL
-> our storage layer API
Multiple MySQL “servers” can
be connected arbitrarily to
storage nodes
7/17
Introduction
System Design
Performance
Conclusions
MySQL Servers
Storage Nodes
Certifier
Storage Nodes
Multi-version, indexed
key-value storage layer
Keys distributed among nodes
using consistent hashing
Keys can be cached; storage
nodes can be started as
cache-only
8/17
Introduction
System Design
Performance
Conclusions
MySQL Servers
Storage Nodes
Certifier
Certifier
Checks whether entries read by
committing update tx are
up-to-date at time of commit
Propagates new entries created
by committing tx to nodes
Read-only tx do not require
certification; updates proceed
optimistically
9/17
Introduction
System Design
Performance
Conclusions
TPC-C
Performance
10/17
Introduction
System Design
Performance
Conclusions
TPC-C
Experimental Configuration for n-node MoSQL
11/17
Introduction
System Design
Performance
Conclusions
TPC-C
TPC-C Throughput vs. InnoDB
Increasing cost of using disk:
0K
10K
20K
30K
40K
50K
60K
70K
80K
10 20 40 80 160 10 20 40 80 160
Throughput(TpmC)
Number of warehouses (10 warehouses per node in MoSQL)
MoSQL
MySQL (InnoDB)
Ideal
12/17
Introduction
System Design
Performance
Conclusions
TPC-C
TPC-C Latency
Large stock-level transactions read from many nodes:
0
0.1
0.2
0.3
0.4
0.5
0.6
0 2 4 6 8 10 12 14 16
Latency(s)
Number of nodes (10 warehouses per node)
Delivery
New Order
Order Status
Payment
Stock Level
13/17
Introduction
System Design
Performance
Conclusions
TPC-C
Remote reads and N-O Thruput for 4 and 8 Nodes
From a cold start, inner B+Tree nodes must be cached
0K
5K
10K
15K
20K
25K
30K
35K
40K
TpmC
0K
10K
20K
30K
40K
50K
60K
70K
80K
0 50 100 150 200 250 300 350 400 450 500 550
Remotereadrequests
Time (sec)
4 nodes
8 nodes
14/17
Introduction
System Design
Performance
Conclusions
TPC-C
Adding Two Storage Nodes Online
60 WH, add 8 clients every 12 seconds, add volatile storage
nodes at t = 72, 108
0K
5K
10K
15K
20K
25K
Throughput(TpmC)
4 storage nodes 5 storage nodes 6 storage nodes
0
50
100
150
200
0 10 20 30 40 50 60 70 80 90 100110120130140150160
Latency(ms)
Time (sec)
MoSQL with node additions
MoSQL baseline
15/17
Introduction
System Design
Performance
Conclusions
Future Work
Appendix: Similar Offerings to MoSQL
Appendix: B+Tree Details
Future Work
Support for different Paxos implementations (experiments
shown use multicast ring-paxos which is of limited use in
“cloud” environments)
Parititioned certification
Usability improvements
We are in the process of open-sourcing MoSQL! Project
page will be updated in the coming weeks:
http://dslab.inf.usi.ch/mosql
16/17
Introduction
System Design
Performance
Conclusions
Future Work
Appendix: Similar Offerings to MoSQL
Appendix: B+Tree Details
Appendix
17/17
Introduction
System Design
Performance
Conclusions
Future Work
Appendix: Similar Offerings to MoSQL
Appendix: B+Tree Details
Related Work
ElasTraS (UCSB): Elastic data store providing transactional
multi-key access to data
ecStore (NU Singapore): peer-to-peer elastic storage with
range-query and tx support; neither ecStore nor ElasTraS
support full SQL transactions
Spanner (Google): Semi-relational model with wide-area tx,
but depends on specialized hardware providing
globally-meaningful timestamps
Megastore (Google): Semi-relational wide-area tx but with
low latency within small partitions; 2PC used for
cross-partition tx
18/17
Introduction
System Design
Performance
Conclusions
Future Work
Appendix: Similar Offerings to MoSQL
Appendix: B+Tree Details
MySQL Specific
GenieDB: A storage engine for MySQL with a geo-replicated
storage layer. Does not appear to offer elasticity.
Xeround: A cloud database service for MySQL applications
promising elastic storage for MySQL. ACID-compliance is
provided through a quorum-based approach based on a
quick look at the patent and whitepaper they have available
for download.
Parelastic: Claim many of the features that MoSQL provides
including elasticity. I would have to register in order to get
the whitepaper, but looking at the patent they have received,
it looks superficially like some kind of middleware approach
not unlike Sprint.
19/17
Introduction
System Design
Performance
Conclusions
Future Work
Appendix: Similar Offerings to MoSQL
Appendix: B+Tree Details
MySQL “Compatibile”
Clustrix: Shared-nothing system claiming MySQL
compatibility and acid-compliance. Engine written from
bottom up to be distributed, using push-down of compiled
query fragments to individual nodes, enabling apparently
better concurrency.
Scalebase: Another example of Sprint-like middleware that
resides between the application and “demoted” RDBMS
nodes and manage transactions and the distribution of data
across nodes.
Intalio: Claims elastic scalability and compatibilty with a
number of different RDBMS systems, so it would appear to
be some sort of Sprint-like middleware, but details are a bit
scarce.
20/17
Introduction
System Design
Performance
Conclusions
Future Work
Appendix: Similar Offerings to MoSQL
Appendix: B+Tree Details
B+Tree and Row Data
Boxes a) - i) are key-values.
100 120 /
100 105 120 12595 / / /
95
<raw data>
100
<raw data>
105
<raw data>
120
<raw data>
125
<raw data>
(a)
(b) (c) (d)
(e) (f) (g) (h) (i)
21/17
Introduction
System Design
Performance
Conclusions
Future Work
Appendix: Similar Offerings to MoSQL
Appendix: B+Tree Details
Some Unnecessary Aborts
Consider concurrent tx:
t1 = INSERT .. (60) and t2 = INSERT .. (130).
Writesets of t1, t2 are (a), and (a, d), so t1 will be aborted if
certified after t2.
100 120 /
100 105 120 12595 / / /
95
<raw data>
100
<raw data>
105
<raw data>
120
<raw data>
125
<raw data>
(a)
(b) (c) (d)
(e) (f) (g) (h) (i)
22/17

MoSQL: An Elastic Storage Engine for MySQL

  • 1.
    Introduction System Design Performance Conclusions MoSQL: AnElastic Storage Engine For MySQL Alexander Tomic, Daniele Sciascia, Fernando Pedone University of Lugano, Switzerland March 20, 2013 ACM SAC 2013 - Dependable and Distributed Systems Track 1/17
  • 2.
    Introduction System Design Performance Conclusions 1 Introduction 2System Design MySQL Servers Storage Nodes Certifier 3 Performance TPC-C 4 Conclusions Future Work Appendix: Similar Offerings to MoSQL Appendix: B+Tree Details 2/17
  • 3.
    Introduction System Design Performance Conclusions MySQL isa popular open-source RDBMS at the core of many web-based applications (part of “LAMP” stack) Typical approaches to scaling MySQL in the wild (e.g. sharding, asynchronous replication) provide weak guarantees and are inflexible1 Elasticity highly desirable in a cyclical world where over-provisioning and energy costs are significant Strong guarantees (serializability) make development much easier 1 Though since original master’s thesis in Sept 2011 some commercial offerings have attempted to remedy this. Details in appendix 3/17
  • 4.
    Introduction System Design Performance Conclusions What dowe define as “elastic”? Add/remove servers to/from a running system Ideally little performance impact 4/17
  • 5.
    Introduction System Design Performance Conclusions What dowe define as “elastic”? Add/remove servers to/from a running system Ideally little performance impact Get Good Things like higher throughput, reduced latency, increased system capacity 4/17
  • 6.
    Introduction System Design Performance Conclusions SQL (90’s)-> NoSQL (00’s) -> NewSQL (10’s) SQL transactions are great, but legacy RDBMS architectures too slow and inflexible “NoSQL” systems of various flavours attempted to fill the void (Dynamo, BigTable, etc.), but pushed significant complexity up to app. developers Re-emergence of (semi-)relational model in contemporary systems such as Spanner and Megastore (Google) Ultimately, no panacea but the usual game of tradeoffs 5/17
  • 7.
    Introduction System Design Performance Conclusions MySQL Servers StorageNodes Certifier Three Layer Architecture of MoSQL 6/17
  • 8.
    Introduction System Design Performance Conclusions MySQL Servers StorageNodes Certifier MySQL Servers MySQL has a storage engine interface enabling different storage strategies to be implemented Serves as a translator from SQL -> our storage layer API Multiple MySQL “servers” can be connected arbitrarily to storage nodes 7/17
  • 9.
    Introduction System Design Performance Conclusions MySQL Servers StorageNodes Certifier Storage Nodes Multi-version, indexed key-value storage layer Keys distributed among nodes using consistent hashing Keys can be cached; storage nodes can be started as cache-only 8/17
  • 10.
    Introduction System Design Performance Conclusions MySQL Servers StorageNodes Certifier Certifier Checks whether entries read by committing update tx are up-to-date at time of commit Propagates new entries created by committing tx to nodes Read-only tx do not require certification; updates proceed optimistically 9/17
  • 11.
  • 12.
  • 13.
    Introduction System Design Performance Conclusions TPC-C TPC-C Throughputvs. InnoDB Increasing cost of using disk: 0K 10K 20K 30K 40K 50K 60K 70K 80K 10 20 40 80 160 10 20 40 80 160 Throughput(TpmC) Number of warehouses (10 warehouses per node in MoSQL) MoSQL MySQL (InnoDB) Ideal 12/17
  • 14.
    Introduction System Design Performance Conclusions TPC-C TPC-C Latency Largestock-level transactions read from many nodes: 0 0.1 0.2 0.3 0.4 0.5 0.6 0 2 4 6 8 10 12 14 16 Latency(s) Number of nodes (10 warehouses per node) Delivery New Order Order Status Payment Stock Level 13/17
  • 15.
    Introduction System Design Performance Conclusions TPC-C Remote readsand N-O Thruput for 4 and 8 Nodes From a cold start, inner B+Tree nodes must be cached 0K 5K 10K 15K 20K 25K 30K 35K 40K TpmC 0K 10K 20K 30K 40K 50K 60K 70K 80K 0 50 100 150 200 250 300 350 400 450 500 550 Remotereadrequests Time (sec) 4 nodes 8 nodes 14/17
  • 16.
    Introduction System Design Performance Conclusions TPC-C Adding TwoStorage Nodes Online 60 WH, add 8 clients every 12 seconds, add volatile storage nodes at t = 72, 108 0K 5K 10K 15K 20K 25K Throughput(TpmC) 4 storage nodes 5 storage nodes 6 storage nodes 0 50 100 150 200 0 10 20 30 40 50 60 70 80 90 100110120130140150160 Latency(ms) Time (sec) MoSQL with node additions MoSQL baseline 15/17
  • 17.
    Introduction System Design Performance Conclusions Future Work Appendix:Similar Offerings to MoSQL Appendix: B+Tree Details Future Work Support for different Paxos implementations (experiments shown use multicast ring-paxos which is of limited use in “cloud” environments) Parititioned certification Usability improvements We are in the process of open-sourcing MoSQL! Project page will be updated in the coming weeks: http://dslab.inf.usi.ch/mosql 16/17
  • 18.
    Introduction System Design Performance Conclusions Future Work Appendix:Similar Offerings to MoSQL Appendix: B+Tree Details Appendix 17/17
  • 19.
    Introduction System Design Performance Conclusions Future Work Appendix:Similar Offerings to MoSQL Appendix: B+Tree Details Related Work ElasTraS (UCSB): Elastic data store providing transactional multi-key access to data ecStore (NU Singapore): peer-to-peer elastic storage with range-query and tx support; neither ecStore nor ElasTraS support full SQL transactions Spanner (Google): Semi-relational model with wide-area tx, but depends on specialized hardware providing globally-meaningful timestamps Megastore (Google): Semi-relational wide-area tx but with low latency within small partitions; 2PC used for cross-partition tx 18/17
  • 20.
    Introduction System Design Performance Conclusions Future Work Appendix:Similar Offerings to MoSQL Appendix: B+Tree Details MySQL Specific GenieDB: A storage engine for MySQL with a geo-replicated storage layer. Does not appear to offer elasticity. Xeround: A cloud database service for MySQL applications promising elastic storage for MySQL. ACID-compliance is provided through a quorum-based approach based on a quick look at the patent and whitepaper they have available for download. Parelastic: Claim many of the features that MoSQL provides including elasticity. I would have to register in order to get the whitepaper, but looking at the patent they have received, it looks superficially like some kind of middleware approach not unlike Sprint. 19/17
  • 21.
    Introduction System Design Performance Conclusions Future Work Appendix:Similar Offerings to MoSQL Appendix: B+Tree Details MySQL “Compatibile” Clustrix: Shared-nothing system claiming MySQL compatibility and acid-compliance. Engine written from bottom up to be distributed, using push-down of compiled query fragments to individual nodes, enabling apparently better concurrency. Scalebase: Another example of Sprint-like middleware that resides between the application and “demoted” RDBMS nodes and manage transactions and the distribution of data across nodes. Intalio: Claims elastic scalability and compatibilty with a number of different RDBMS systems, so it would appear to be some sort of Sprint-like middleware, but details are a bit scarce. 20/17
  • 22.
    Introduction System Design Performance Conclusions Future Work Appendix:Similar Offerings to MoSQL Appendix: B+Tree Details B+Tree and Row Data Boxes a) - i) are key-values. 100 120 / 100 105 120 12595 / / / 95 <raw data> 100 <raw data> 105 <raw data> 120 <raw data> 125 <raw data> (a) (b) (c) (d) (e) (f) (g) (h) (i) 21/17
  • 23.
    Introduction System Design Performance Conclusions Future Work Appendix:Similar Offerings to MoSQL Appendix: B+Tree Details Some Unnecessary Aborts Consider concurrent tx: t1 = INSERT .. (60) and t2 = INSERT .. (130). Writesets of t1, t2 are (a), and (a, d), so t1 will be aborted if certified after t2. 100 120 / 100 105 120 12595 / / / 95 <raw data> 100 <raw data> 105 <raw data> 120 <raw data> 125 <raw data> (a) (b) (c) (d) (e) (f) (g) (h) (i) 22/17