Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Best storage engine for MySQL


Published on

MySQL engine swap for InnoDB. Better, cheaper, faster!

Published in: Data & Analytics

Best storage engine for MySQL

  1. 1. De epDB for MySQL® Ove r vi ew J u l y 2 0 1 4
  2. 2. The World We Live In… • According to IDC, the Database software market has a CAGR of 34.2% • Wal-Mart generates 1 million new database records every hour • Chevron generates data at a rate of 2TB/day! • According to the Data Warehousing Institute 46% of companies plan to replace their existing data warehousing platforms • Every day, we create 2.5 quintillion bytes of data — so much that 90% of the data in the world today has been created in the last two years alone.
  3. 3. MySQL Challenges • Performance degrades as table sizes get larger – Limitations of the underlying computer science • Highly indexed schemas negatively impact performance – More indexes helps query performance but hurts transactions • Poor performance with complex queries – Many table joins • Data loading times are slow due to poor concurrency – Table locking and single threaded operations • Backup time and performance impact – Big databases are slow to backup and effect system performance
  4. 4. Technology Limitations Most relational databases use Traditional B+ Trees which have architectural limitations that become apparent with large data sets or heavy indexing 120,000 100,000 80,000 60,000 40,000 20,000 - 997,000 5,901,000 InnoDB Inser ng rows into a table with 7 Indexes (using iiBench with 10 clients in insert only with secondary indexes) Running on So Layer 32 Core system with 32GB RAM and 4 drive HDD RAID5. MySQL 5.5.35 running on Ubuntu 12.04. Key InnoDB parameters: innodb_buffer_pool_size= 4G innodb_flush_log_at_trx_commit =0 innodb_flush_method=O_DIRECT innodb_log_file_size=100M innodb_log_files_in_group=2 innodb_log_buffer_size=16M 10,857,000 15,855,000 20,741,000 25,684,000 30,743,000 35,598,000 40,562,000 45,533,000 50,680,000 55,678,000 60,619,000 65,510,000 70,566,000 75,550,000 80,608,000 85,547,000 90,652,000 Insert Rate Row Count Elapsed me: 40,600 seconds
  5. 5. Cache Ahead Summary Index Tree Derived from the classic B+ Tree Embedded statistics and other meta-data in the nodes improves both tree navigation and indexing Branch node segments can vary in size based on actual data values Summary nodes provide a mechanism navigate extremely large tables by minimizing the number of branches walked Wider trees with embedded meta-data to enhance search and modification operations CASI Tree Instantiations: • A CASI Tree exists in both memory and on disk for each table and index • The structure of the Tree on disk and in memory are different • The (re)organization of the Tree on disk happens asynchronously from the one in memory based on adaptive algorithms, to yield improved disk I/O and CPU concurrency Root Node Branch Summary Node Node Summary Node Summary Branch Node Node Branch Node
  6. 6. CASI Tree Benefits CONSTANT TIME INDEXING Lightning fast indexing at extreme scale SEGMENTED COLUMN STORE Accelerates analytic operations and data management at scale STREAMING I/O Maximizes disk throughput with highly efficient use of IOPS EXTREME CONCURRENCY Minimizes locks and wait states to maximize CPU throughput INTELLIGENT CACHING Uses adaptive segment sizes and summaries to eliminate many disk reads BUILD FOR THE CLOUD Adaptive configuration and continuous optimization eliminates scheduled downtime CASI Tree Principles: • Always try to append data to file (i.e. don't seek, use the current seek position) • Read data sequentially (i.e. don't seek, use the current seek position for next sequence of reads) • Continually re-writes & reorders data such that the previous two principles above are met
  7. 7. Constant Time Indexing Minimizes index cost enabling high performance heavily indexed tables Different data structures on disk and in memory All work is performed in constant-time eliminating the need for periodic flushing Streaming File I/O (No memory map page size limitations) In Memory: Enhanced B+ Tree • Optimized for ‘wide’ nodes with accelerated operations • Stores index summaries to achieve great scale while maximizing cache effectiveness • Values are stored independently of the tree • Tree rebalancing occurs only in memory – no impact on data stored on disk • No fixed page/block sizes On Disk: Segmented Column Store • Highly optimized for on-disk read/write access • Never requires operational/in-place rebalancing • All previous database states are available • Efficiently supports variable size keys, values and ‘point reads’ • Utilizes segmented column store technology for indexes and columns Key Benefits: Increases maximum practical table sizes and improves analytic performance by allowing for more indexing
  8. 8. Segmented Column Store Structure of the index files for the database – Provides the functional capabilities of a column store – Simultaneously read and write optimized – Instantaneous database start up/shut down – Columns are updated in tandem with value changes – Consistent performance and latency; optimized in real time – Columns consist of variable length segments – Each segment is a block of ordered keys, references to rows and meta-data – Changes to the key space require only delta updates Optimized for real-time analytics – Embedded statistical data in each segment – Allows for heavy indexing to improve query performance – Enables continuous transactional data feed Suited for high levels of compression – Compact representation of keys with summarization – Flexible segment and delta compression Segmented Column Store Header Segment Type & Size Meta-Data Segment Type & Size Segment A Segment B Delta Changes to Segment A Back Reference Keys and /or Values Segment Type & Size Meta-Data Keys and /or Values Segment Type & Size Segment Type & Size Meta-Data Keys and /or Values Segment Type & Size Key Benefits: Excellent compression facilities and improved query performance. Supports continuous streaming backups with snapshots
  9. 9. Streaming I/O • Massively optimized delivering near wire speed throughput • Append only file structures virtually eliminate disk seeks • Concurrent operations for updates in memory & on disk • Optimizations for SSD, HDD, and in-memory-only operation • Minimizes IO wait states Data Streams Streaming Transactional State Logging DeepDB Streaming Indexing Key Benefits: Achieves near SSD like performance with magnetic HDD’s. Extends the life expectancy of SSD’s with built in wear leveling and no write amplification
  10. 10. Extreme Concurrency Running the Sysbench test On a 32 CPU core system with 32 attached clients Strands system resources and takes longer to complete the test Load time 8m59s Test Time 54.09s Transaction rate: 1.4k/sec Utilizes ~100% of available system resources to complete the test Load time 23.96s Test Time 5.82s Transaction rate: 15k/sec Key Benefits: Database operations take full advantage of all allocated system resources and dramatically improves system performance
  11. 11. Intelligent Caching • Adaptive algorithms manage cache usage – Dynamically sized data segments – Point read capable: no page operations • In-memory compression – Maximizes cache effectiveness – Adaptive operation manages compression vs. performance • Summary indexing reduces cache ‘thrashing’ – Only pull in the data that is relevant – No need to pull ‘pages’ in to cache Key Benefits: Improves overall system performance by staying in cache more often then standard MySQL
  12. 12. Built for the Cloud • Designed for easy deployments with virtually no configuration required in most cases • No off-line operations – Continuous defragmentation & optimization – No downtime for scheduled maintenance • Linear performance and consistent low latency • Instantaneous startup and shutdown • No performance degradations due to B+ Tree rebalancing or log flushing Key Benefits: Rapid deployment with almost no configuration and no off- line maintenance operations. Delivers greatly enhanced performance when using network based storage
  13. 13. DeepDB for MySQL A storage engine that breaks through current performance and scaling limitations – Easy-to-install plugin replacement for the InnoDB storage engine – Requires no application or schema changes – Scales-up performance of existing systems – Increases practical data sizes and complexity – Billions of rows with high index densities – High performance index creation/maintenance – High performance ACID transactions with consistently low latency – Reduced query latencies Application Examples: Wordpress | SugarCRM | Drupal PHP | Perl | Python | Etc. Apache Server MySQL DeepDB InnoDB CentOS | RHEL | Ubuntu Bare metal | Virtualized | Cloud
  14. 14. Benefits The Entire Data Lifecycle Load - Delimited files - Dump files Operate - Transactions - Compress Analyze - Replicate - Query Protect - Backup - Recover DeepDB Provides enhanced scaling and performance across a broad set of use cases Compatible with all existing MySQL applications and tool chains Designed to fully leverage todays powerful computing systems Optimized for deployment in the cloud with adaptive behavior and on-line maintenance
  15. 15. Data Loading 15 DeepDB Reduces data loading times by 20x or more Whether you are loading delimited files or restoring MySQL dump files DeepDB can dramatically reduce your load times DeepDB’s data loading advantage can be seen in both dedicated bare-metal and cloud based deployments
  16. 16. Transactional Performance Use Cases (All tests performed on MySQL 5.5) MySQL with DeepDB MySQL with InnoDB Improvement Streaming Data test (Machine-to-Machine) (iiBench Maximum Transactions/second with Single index) 3.795M/sec 217k/sec 17x Transactional Workload Test (Financial) (Sysbench transaction rate) 15,083/sec 1,381/sec 11x Complex Transactional Test (e-Commerce) (DBT-2 transaction rate using HDD) 205,184/min 15,086/min 13.6x Social Media Transactional Test (Twitter) (iiBench with 250M Rows,7 Indexes w/ composite keys) Database Creation 15 Minutes 24 Hours 96x First query from cold start 50 seconds 5.5 Minutes 6.6x Second query from Cold start 1 second 240 seconds 240x Disk storage footprint (uncompressed) 29GB 50G 42% 16
  17. 17. Advantage in the Cloud 17
  18. 18. Reduces Disk Size Requirements 18 6,000 5,000 4,000 3,000 2,000 1,000 - Uncompressed Compressed 5,400 2,800 3,780 640 Size in GB On Disk Data Size InnoDB DeepDB
  19. 19. Cut Your Query Times in Half 2.5 2 1.5 1 0.5 DeepDB improves query speed by 1.5 to 2 times when measured 19 against DBT3 benchmark 1.75 1.86 2.00 1.88 1.93 2.06 1.62 1.87 0 SF=1, 2G, Avg 2 runs SF=1, 16G, Avg 5 runs SF=1, 16G, Key Comp, Avg 5 runs SF=2, 16G, Avg 5 runs SF=2, 16G, Key Comp, Avg 5 runs SF=5, 16G, Avg 2 runs SF=5, 16G, Key Comp, Avg 5 runs Overall Average Times Faster DBT3 Performance Comparison Summary Average query performance across various configura ons InnoDB DeepDB
  20. 20. Protect Your Data DeepDB architecture eliminates potential data integrity problems and patent-pending error recovery completes in just seconds • No updates in place • No memory map Unique data structures support real-time and continuous streaming backups to ensure data is always protected • Append only files provide natural incremental backups DeepDB Ensures your data is continually backed up and available 20
  21. 21. DeepDB Advantages 21 The Ultimate MySQL Storage Engine 50% Smaller Data Footprint Reduces compressed or uncompressed data to less than half the size of InnoDB 5x-10x Improvement in ACID transactional throughput Plug-in Replacement for InnoDB Install DeepDB without any changes to existing MySQL Applications HDD=SSD Increases effective HDD throughput to near SSD levels and extends SSD life up to 10x 1B+ Rows Provides high performance support for very large tables 20x Faster Data Loading Concurrent operations and IO optimizations reduces load times Run Queries Twice as Fast Summary Indexing techniques enable ultra low latency queries Real-Time Backups Create streaming backups with snapshotting Low Latency Replicas Efficiently scale out analytics and read heavy work loads
  22. 22. 22 Try DeepDB yourself!
  23. 23. Thank You! 23