Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Upcoming SlideShare
Phabricator gdg presentation
Phabricator gdg presentation
Loading in …3
1 of 40

Getting innodb compression_ready_for_facebook_scale



Download to read offline

Related Books

Free with a 30 day trial from Scribd

See all

Getting innodb compression_ready_for_facebook_scale

  1. 1. InnoDB Compression Getting it ready for Facebook scale Nizam Ordulu Software Engineer, database engineering @Facebook 4/11/12
  2. 2. Why use compression
  3. 3. Why use compression ▪ Save disk space. ▪ Buy fewer servers. ▪ Buy better disks (SSD) without too much increase in cost. ▪ Reduce IOPS.
  4. 4. Database Size
  5. 5. IOPS
  6. 6. Sysbench Benchmarks
  7. 7. Sysbench Default table schema for sysbench CREATE TABLE `sbtest` ( `id` int(10) unsigned NOT NULL auto_increment, `k` int(10) unsigned NOT NULL default '0', `c` char(120) NOT NULL default '', `pad` char(60) NOT NULL default '', PRIMARY KEY (`id`), KEY `k` (`k`) );
  8. 8. In-memory benchmark Configuration ▪ Buffer pool size =1G. ▪ 16 tables. ▪ 250K rows on each table. ▪ Uncompressed db size = 1.1G. ▪ Compressed db size = 600M. ▪ In-memory benchmark. ▪ 16 threads.
  9. 9. In-memory benchmark Load Time Time(s) 80 70 60 50 40 Time(s) 30 20 10 0 mysql-uncompressed mysql-compressed fb-mysql-uncompressed fb-mysql-compressed
  10. 10. In-memory benchmark Database size after load Size (M) 1200 1000 800 600 Size (M) 400 200 0 mysql-uncompressed mysql-compressed fb-mysql-uncompressed fb-mysql-compressed
  11. 11. In-memory benchmark Transactions per second for reads (oltp.lua, read-only) Transactions Per Second (Read-Only) 8000 7000 6000 5000 4000 TPS 3000 2000 1000 0 mysql-uncompressed mysql-compressed fb-mysql-uncompressed fb-mysql-compressed
  12. 12. In-memory benchmark Inserts per second (insert.lua) Inserts Per Second 60000 50000 40000 30000 IPS 20000 10000 0 mysql-uncompressed mysql-compressed fb-mysql-uncompressed fb-mysql-compressed (4X)
  13. 13. IO-bound benchmark for inserts Inserts per second (insert.lua) Inserts Per Second 60000 50000 40000 30000 IPS 20000 10000 0 mysql-uncompressed fb-mysql-uncompressed
  14. 14. InnoDB Compression
  15. 15. InnoDB Compression Basics ▪ 16K Pages are compressed to 1K, 2K, 4K, 8K blocks. ▪ Block size is specified during table creation. ▪ 8K is safest if data is not too compressible. ▪ blobs and varchars increase compressibility. ▪ In-memory workloads may require larger buffer pool.
  16. 16. InnoDB Compression Example CREATE TABLE `sbtest1` ( `id` int(10) unsigned NOT NULL AUTO_INCREMENT, `k` int(10) unsigned NOT NULL DEFAULT '0', `c` char(120) NOT NULL DEFAULT '’, `pad` char(60) NOT NULL DEFAULT '', PRIMARY KEY (`id`), KEY `k_1` (`k`) ) ENGINE=InnoDB ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=8
  17. 17. InnoDB Compression Page Modification Log (mlog) ▪ InnoDB does not recompress a page on every update. ▪ Updates are appended to the modification log. ▪ mlog is located in the bottom of the compressed page. ▪ When mlog is full, page is recompressed.
  18. 18. InnoDB Compression Page Modification Log Example
  19. 19. InnoDB Compression Page Modification Log Example
  20. 20. InnoDB Compression Page Modification Log Example
  21. 21. InnoDB Compression Page Modification Log Example
  22. 22. InnoDB Compression Compression failures are bad ▪ Compression failures: ▪ waste CPU cycles, ▪ cause mutex contention.
  23. 23. InnoDB Compression Unzip LRU ▪ A compressed block is decompressed when it is read. ▪ Compressed and uncompressed copy are both in memory. ▪ Any update on the page is applied to both of the copies. ▪ When it is time to evict a page: ▪ Evict an uncompressed copy if the system is IO-bound. ▪ Evict a page from the normal LRU if the system is CPU- bound.
  24. 24. InnoDB Compression Compressed pages written to redo log ▪ Compressed pages are written to redo log. ▪ Reasons for doing this: ▪ Reuse redo logs even if the zlib version changes. ▪ Prevent against indeterminism in compression. ▪ Increase in redo log writes. ▪ Increase in checkpoint frequency.
  25. 25. InnoDB Compression Official advice on tuning compression If the number of “successful” compression operations (COMPRESS_OPS_OK) is a high percentage of the total number of compression operations (COMPRESS_OPS), then the system is likely performing well. If the ratio is low, then InnoDB is reorganizing, recompressing, and splitting B-tree nodes more often than is desirable. In this case, avoid compressing some tables, or increase KEY_BLOCK_SIZE for some of the compressed tables. You might turn off compression for tables that cause the number of “compression failures” in your application to be more than 1% or 2% of the total. (Such a failure ratio might be acceptable during a temporary operation such as a data load).
  26. 26. Facebook Improvements
  27. 27. Facebook Improvements Finding bugs and testing new features ▪ Expanded mtr test suite with crash-recovery and stress tests. ▪ Simulate compression failures. ▪ Fixed the bugs revealed by the tests and production servers.
  28. 28. Facebook Improvements Table level compression statistics ▪ Added the following columns to table_statistics: ▪ COMPRESS_OPS, ▪ COMPRESS_OPS_OK, ▪ COMPRESS_USECS, ▪ UNCOMPRESS_OPS, ▪ UNCOMPRESS_USECS.
  29. 29. Facebook Improvements Removal of compressed pages from redo log ▪ Removed compressed page images from redo log. ▪ Introduced a new log record for compression.
  30. 30. Facebook Improvements Adaptive padding ▪ Put less data on each page to prevent compression failures. ▪ pad = 16K – (maximum data size allowed on the uncompressed copy)
  31. 31. Facebook Improvements Adaptive padding
  32. 32. Facebook Improvements Adaptive padding
  33. 33. Facebook Improvements Adaptive padding ▪ Algorithm to determine pad per table: ▪ Increase the pad until the compression failure rate reaches the specified level. ▪ Decrease padding if the failure rate is too low. ▪ Adapts to the compressibility of data over time.
  34. 34. Facebook Improvements Adaptive padding on insert benchmark Inserts Per Second ▪ Padding value for sbtable is 2432. 35000 ▪ Compression failure rate: 30000 ▪ mysql: 41%. 25000 ▪ fb-mysql: 5%. 20000 15000 10000 5000 0 mysql-compressed fb-mysql-compressed
  35. 35. Facebook Improvements Compression ops in insert benchmark 1400000 1200000 1000000 800000 compress_ops_ok 600000 compress_ops_fail 400000 200000 0 mysql-compressed fb-mysql-compressed
  36. 36. Facebook Improvements Time spent for compression ops in insert benchmark 1200 1000 800 compress_time(s) 600 decompress_time(s) 400 200 0 mysql-compressed fb-mysql-compressed
  37. 37. Facebook Improvements Other improvements ▪ Amount of empty allocated pages: 10-15% to 2-5%. ▪ Cache memory allocations for: ▪ compression buffers, ▪ decompression buffers, ▪ buffer page descriptors. ▪ Hardware accelerated checksum for compressed pages. ▪ Remove adler32 calls from zlib functions.
  38. 38. Facebook Improvements Future work ▪ Make page_zip_compress() more efficient. ▪ Test larger page sizes:32K, 64K. ▪ Prefix compression. ▪ Other compression algorithms: snappy, quicklz etc. ▪ 3X compression in production.
  39. 39. Questions
  40. 40.

Editor's Notes

  • Introduction, interruptions ok, questions at the end.
  • Use existing servers for a longer time.
  • Linear growth until first arrow. Drops correspond to compression of servers In batches. Percentages are computed by taking the current size and the predicted uncompressed size.
  • For reference, these 3 arrows correspond to the same times as previous arrows.
  • I chose sysbench because it’s a common benchmark framework
  • We could guess that this table would be compressible even before looking at the data.
  • Grabbed the latest 5.1 source code from launchpad. 4 versions: stock mysql uncompressed, stock mysql compressed, mysql with fb patch uncompressed, mysql with facebook patch compressed.
  • Note that even though compressed mysql with fb patch has higher throughput, it doesn’t increase the disk space used by the database in this case.
  • Just making sure the read-only perf is ok.
  • This is the main difference in terms of performance.
  • The results are not peculiar to In-Memory workloads.
  • Naïve way to implement compression: compress before flushing to the disk. A less naïve but inefficient way: keep compressed copy in memory & recompress on every update. What innodb does: modification log. Note that this would not be necessary for LSM-based architectures.
  • Mention the assumptions about the compressibility of a page. Master-slave method for checking consistency.
  • ×