MoSQL: An Elastic Storage Engine for MySQL

645 views

Published on

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
645
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
11
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

MoSQL: An Elastic Storage Engine for MySQL

  1. 1. Introduction System Design Performance Conclusions MoSQL: An Elastic Storage Engine For MySQL Alexander Tomic, Daniele Sciascia, Fernando Pedone University of Lugano, Switzerland March 20, 2013 ACM SAC 2013 - Dependable and Distributed Systems Track 1/17
  2. 2. Introduction System Design Performance Conclusions 1 Introduction 2 System Design MySQL Servers Storage Nodes Certifier 3 Performance TPC-C 4 Conclusions Future Work Appendix: Similar Offerings to MoSQL Appendix: B+Tree Details 2/17
  3. 3. Introduction System Design Performance Conclusions MySQL is a popular open-source RDBMS at the core of many web-based applications (part of “LAMP” stack) Typical approaches to scaling MySQL in the wild (e.g. sharding, asynchronous replication) provide weak guarantees and are inflexible1 Elasticity highly desirable in a cyclical world where over-provisioning and energy costs are significant Strong guarantees (serializability) make development much easier 1 Though since original master’s thesis in Sept 2011 some commercial offerings have attempted to remedy this. Details in appendix 3/17
  4. 4. Introduction System Design Performance Conclusions What do we define as “elastic”? Add/remove servers to/from a running system Ideally little performance impact 4/17
  5. 5. Introduction System Design Performance Conclusions What do we define as “elastic”? Add/remove servers to/from a running system Ideally little performance impact Get Good Things like higher throughput, reduced latency, increased system capacity 4/17
  6. 6. Introduction System Design Performance Conclusions SQL (90’s) -> NoSQL (00’s) -> NewSQL (10’s) SQL transactions are great, but legacy RDBMS architectures too slow and inflexible “NoSQL” systems of various flavours attempted to fill the void (Dynamo, BigTable, etc.), but pushed significant complexity up to app. developers Re-emergence of (semi-)relational model in contemporary systems such as Spanner and Megastore (Google) Ultimately, no panacea but the usual game of tradeoffs 5/17
  7. 7. Introduction System Design Performance Conclusions MySQL Servers Storage Nodes Certifier Three Layer Architecture of MoSQL 6/17
  8. 8. Introduction System Design Performance Conclusions MySQL Servers Storage Nodes Certifier MySQL Servers MySQL has a storage engine interface enabling different storage strategies to be implemented Serves as a translator from SQL -> our storage layer API Multiple MySQL “servers” can be connected arbitrarily to storage nodes 7/17
  9. 9. Introduction System Design Performance Conclusions MySQL Servers Storage Nodes Certifier Storage Nodes Multi-version, indexed key-value storage layer Keys distributed among nodes using consistent hashing Keys can be cached; storage nodes can be started as cache-only 8/17
  10. 10. Introduction System Design Performance Conclusions MySQL Servers Storage Nodes Certifier Certifier Checks whether entries read by committing update tx are up-to-date at time of commit Propagates new entries created by committing tx to nodes Read-only tx do not require certification; updates proceed optimistically 9/17
  11. 11. Introduction System Design Performance Conclusions TPC-C Performance 10/17
  12. 12. Introduction System Design Performance Conclusions TPC-C Experimental Configuration for n-node MoSQL 11/17
  13. 13. Introduction System Design Performance Conclusions TPC-C TPC-C Throughput vs. InnoDB Increasing cost of using disk: 0K 10K 20K 30K 40K 50K 60K 70K 80K 10 20 40 80 160 10 20 40 80 160 Throughput(TpmC) Number of warehouses (10 warehouses per node in MoSQL) MoSQL MySQL (InnoDB) Ideal 12/17
  14. 14. Introduction System Design Performance Conclusions TPC-C TPC-C Latency Large stock-level transactions read from many nodes: 0 0.1 0.2 0.3 0.4 0.5 0.6 0 2 4 6 8 10 12 14 16 Latency(s) Number of nodes (10 warehouses per node) Delivery New Order Order Status Payment Stock Level 13/17
  15. 15. Introduction System Design Performance Conclusions TPC-C Remote reads and N-O Thruput for 4 and 8 Nodes From a cold start, inner B+Tree nodes must be cached 0K 5K 10K 15K 20K 25K 30K 35K 40K TpmC 0K 10K 20K 30K 40K 50K 60K 70K 80K 0 50 100 150 200 250 300 350 400 450 500 550 Remotereadrequests Time (sec) 4 nodes 8 nodes 14/17
  16. 16. Introduction System Design Performance Conclusions TPC-C Adding Two Storage Nodes Online 60 WH, add 8 clients every 12 seconds, add volatile storage nodes at t = 72, 108 0K 5K 10K 15K 20K 25K Throughput(TpmC) 4 storage nodes 5 storage nodes 6 storage nodes 0 50 100 150 200 0 10 20 30 40 50 60 70 80 90 100110120130140150160 Latency(ms) Time (sec) MoSQL with node additions MoSQL baseline 15/17
  17. 17. Introduction System Design Performance Conclusions Future Work Appendix: Similar Offerings to MoSQL Appendix: B+Tree Details Future Work Support for different Paxos implementations (experiments shown use multicast ring-paxos which is of limited use in “cloud” environments) Parititioned certification Usability improvements We are in the process of open-sourcing MoSQL! Project page will be updated in the coming weeks: http://dslab.inf.usi.ch/mosql 16/17
  18. 18. Introduction System Design Performance Conclusions Future Work Appendix: Similar Offerings to MoSQL Appendix: B+Tree Details Appendix 17/17
  19. 19. Introduction System Design Performance Conclusions Future Work Appendix: Similar Offerings to MoSQL Appendix: B+Tree Details Related Work ElasTraS (UCSB): Elastic data store providing transactional multi-key access to data ecStore (NU Singapore): peer-to-peer elastic storage with range-query and tx support; neither ecStore nor ElasTraS support full SQL transactions Spanner (Google): Semi-relational model with wide-area tx, but depends on specialized hardware providing globally-meaningful timestamps Megastore (Google): Semi-relational wide-area tx but with low latency within small partitions; 2PC used for cross-partition tx 18/17
  20. 20. Introduction System Design Performance Conclusions Future Work Appendix: Similar Offerings to MoSQL Appendix: B+Tree Details MySQL Specific GenieDB: A storage engine for MySQL with a geo-replicated storage layer. Does not appear to offer elasticity. Xeround: A cloud database service for MySQL applications promising elastic storage for MySQL. ACID-compliance is provided through a quorum-based approach based on a quick look at the patent and whitepaper they have available for download. Parelastic: Claim many of the features that MoSQL provides including elasticity. I would have to register in order to get the whitepaper, but looking at the patent they have received, it looks superficially like some kind of middleware approach not unlike Sprint. 19/17
  21. 21. Introduction System Design Performance Conclusions Future Work Appendix: Similar Offerings to MoSQL Appendix: B+Tree Details MySQL “Compatibile” Clustrix: Shared-nothing system claiming MySQL compatibility and acid-compliance. Engine written from bottom up to be distributed, using push-down of compiled query fragments to individual nodes, enabling apparently better concurrency. Scalebase: Another example of Sprint-like middleware that resides between the application and “demoted” RDBMS nodes and manage transactions and the distribution of data across nodes. Intalio: Claims elastic scalability and compatibilty with a number of different RDBMS systems, so it would appear to be some sort of Sprint-like middleware, but details are a bit scarce. 20/17
  22. 22. Introduction System Design Performance Conclusions Future Work Appendix: Similar Offerings to MoSQL Appendix: B+Tree Details B+Tree and Row Data Boxes a) - i) are key-values. 100 120 / 100 105 120 12595 / / / 95 <raw data> 100 <raw data> 105 <raw data> 120 <raw data> 125 <raw data> (a) (b) (c) (d) (e) (f) (g) (h) (i) 21/17
  23. 23. Introduction System Design Performance Conclusions Future Work Appendix: Similar Offerings to MoSQL Appendix: B+Tree Details Some Unnecessary Aborts Consider concurrent tx: t1 = INSERT .. (60) and t2 = INSERT .. (130). Writesets of t1, t2 are (a), and (a, d), so t1 will be aborted if certified after t2. 100 120 / 100 105 120 12595 / / / 95 <raw data> 100 <raw data> 105 <raw data> 120 <raw data> 125 <raw data> (a) (b) (c) (d) (e) (f) (g) (h) (i) 22/17

×