11. In-memory benchmark
Transactions per second for reads (oltp.lua, read-only)
Transactions Per Second (Read-Only)
8000
7000
6000
5000
4000
TPS
3000
2000
1000
0
mysql-uncompressed mysql-compressed fb-mysql-uncompressed fb-mysql-compressed
12. In-memory benchmark
Inserts per second (insert.lua)
Inserts Per Second
60000
50000
40000
30000
IPS
20000
10000
0
mysql-uncompressed mysql-compressed fb-mysql-uncompressed fb-mysql-compressed (4X)
13. IO-bound benchmark for inserts
Inserts per second (insert.lua)
Inserts Per Second
60000
50000
40000
30000
IPS
20000
10000
0
mysql-uncompressed fb-mysql-uncompressed
15. InnoDB Compression
Basics
▪ 16K Pages are compressed to 1K, 2K, 4K, 8K blocks.
▪ Block size is specified during table creation.
▪ 8K is safest if data is not too compressible.
▪ blobs and varchars increase compressibility.
▪ In-memory workloads may require larger buffer pool.
16. InnoDB Compression
Example
CREATE TABLE `sbtest1` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`k` int(10) unsigned NOT NULL DEFAULT '0',
`c` char(120) NOT NULL DEFAULT '’,
`pad` char(60) NOT NULL DEFAULT '',
PRIMARY KEY (`id`),
KEY `k_1` (`k`)
) ENGINE=InnoDB ROW_FORMAT=COMPRESSED
KEY_BLOCK_SIZE=8
17. InnoDB Compression
Page Modification Log (mlog)
▪ InnoDB does not recompress a page on every update.
▪ Updates are appended to the modification log.
▪ mlog is located in the bottom of the compressed page.
▪ When mlog is full, page is recompressed.
23. InnoDB Compression
Unzip LRU
▪ A compressed block is decompressed when it is read.
▪ Compressed and uncompressed copy are both in memory.
▪ Any update on the page is applied to both of the copies.
▪ When it is time to evict a page:
▪ Evict an uncompressed copy if the system is IO-bound.
▪ Evict a page from the normal LRU if the system is CPU-
bound.
24. InnoDB Compression
Compressed pages written to redo log
▪ Compressed pages are written to redo log.
▪ Reasons for doing this:
▪ Reuse redo logs even if the zlib version changes.
▪ Prevent against indeterminism in compression.
▪ Increase in redo log writes.
▪ Increase in checkpoint frequency.
25. InnoDB Compression
Official advice on tuning compression
If the number of “successful” compression operations
(COMPRESS_OPS_OK) is a high percentage of the total
number of compression operations (COMPRESS_OPS), then the
system is likely performing well. If the ratio is low, then InnoDB is
reorganizing, recompressing, and splitting B-tree nodes more
often than is desirable. In this case, avoid compressing some
tables, or increase KEY_BLOCK_SIZE for some of the
compressed tables. You might turn off compression for tables
that cause the number of “compression failures” in your
application to be more than 1% or 2% of the total. (Such a failure
ratio might be acceptable during a temporary operation such as a
data load).
27. Facebook Improvements
Finding bugs and testing new features
▪ Expanded mtr test suite with crash-recovery and stress tests.
▪ Simulate compression failures.
▪ Fixed the bugs revealed by the tests and production servers.
28. Facebook Improvements
Table level compression statistics
▪ Added the following columns to table_statistics:
▪ COMPRESS_OPS,
▪ COMPRESS_OPS_OK,
▪ COMPRESS_USECS,
▪ UNCOMPRESS_OPS,
▪ UNCOMPRESS_USECS.
29. Facebook Improvements
Removal of compressed pages from redo log
▪ Removed compressed page images from redo log.
▪ Introduced a new log record for compression.
30. Facebook Improvements
Adaptive padding
▪ Put less data on each page to prevent compression failures.
▪ pad = 16K – (maximum data size allowed on the uncompressed copy)
33. Facebook Improvements
Adaptive padding
▪ Algorithm to determine pad per table:
▪ Increase the pad until the compression failure rate reaches
the specified level.
▪ Decrease padding if the failure rate is too low.
▪ Adapts to the compressibility of data over time.
34. Facebook Improvements
Adaptive padding on insert benchmark
Inserts Per Second
▪ Padding value for sbtable is 2432. 35000
▪ Compression failure rate: 30000
▪ mysql: 41%. 25000
▪ fb-mysql: 5%. 20000
15000
10000
5000
0
mysql-compressed fb-mysql-compressed
36. Facebook Improvements
Time spent for compression ops in insert benchmark
1200
1000
800
compress_time(s)
600
decompress_time(s)
400
200
0
mysql-compressed fb-mysql-compressed
37. Facebook Improvements
Other improvements
▪ Amount of empty allocated pages: 10-15% to 2-5%.
▪ Cache memory allocations for:
▪ compression buffers,
▪ decompression buffers,
▪ buffer page descriptors.
▪ Hardware accelerated checksum for compressed pages.
▪ Remove adler32 calls from zlib functions.
38. Facebook Improvements
Future work
▪ Make page_zip_compress() more efficient.
▪ Test larger page sizes:32K, 64K.
▪ Prefix compression.
▪ Other compression algorithms: snappy, quicklz etc.
▪ 3X compression in production.
Introduction, interruptions ok, questions at the end.
Use existing servers for a longer time.
Linear growth until first arrow. Drops correspond to compression of servers In batches. Percentages are computed by taking the current size and the predicted uncompressed size.
For reference, these 3 arrows correspond to the same times as previous arrows.
I chose sysbench because it’s a common benchmark framework
We could guess that this table would be compressible even before looking at the data.
Grabbed the latest 5.1 source code from launchpad. 4 versions: stock mysql uncompressed, stock mysql compressed, mysql with fb patch uncompressed, mysql with facebook patch compressed.
Note that even though compressed mysql with fb patch has higher throughput, it doesn’t increase the disk space used by the database in this case.
Just making sure the read-only perf is ok.
This is the main difference in terms of performance.
The results are not peculiar to In-Memory workloads.
Naïve way to implement compression: compress before flushing to the disk. A less naïve but inefficient way: keep compressed copy in memory & recompress on every update. What innodb does: modification log. Note that this would not be necessary for LSM-based architectures.
Mention the assumptions about the compressibility of a page. Master-slave method for checking consistency.