Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Ужимай и властвуй алгоритмы компрессии в базах данных / Петр Зайцев (Percona)


Published on

Чем меньше размер данных, тем дешевле их хранить и часто быстрее обрабатывать. Разработчики баз данных издавна задумывались над тем, как обеспечить максимальную степень сжатия данных.
В данном докладе мы рассмотрим, почему интерес к компрессии информации в базах данных особенно высок в последние годы. Мы также рассмотрим различные подходы к уменьшению размера хранимых данных, включая:
+ дедубликацию данных;
+ префикс-компрессию индексов;
+ компрессию на уровне файловой системы;
+ поколоночное хранение данных;
+ постраничную компрессию (на примере хранилища Innodb);
+ компрессию во фрактальных и LSM деревьях;
+ компрессию путем "прокалывания дырок" в файлах;
+ компрессию данных на уровне пользователя.

Published in: Engineering
  • Be the first to comment

  • Be the first to like this

Ужимай и властвуй алгоритмы компрессии в базах данных / Петр Зайцев (Percona)

  1. 1. Compression in Open Source Databases Peter Zaitsev CEO, Percona MySQL Central @ OOW 26 Oct 2015
  2. 2. Few Words About Percona Your Partner in MySQL and MongoDB Success 100% Open Source Software “No Lock in Required” Solutions and Services We work with MySQL, MariaDB, MongoDB, Amazon RDS and Aurora 2
  3. 3. About the Talk A bit of the History Approaches to Data Compression What some of the popular systems implement 3
  4. 4. Lets Define The Term Compression - Any Technique to make data size smaller 4
  5. 5. A bit of History Early Computers were too slow to compress data in Software Hardware Encryption (ie Tape) Compression first appears for non performance critical data 5
  6. 6. We did not need it much for space… 6
  7. 7. Welcome to the modern age Data Growth outpaces HDD improvements Powerful CPUs Flash Cloud Data we store now 7
  8. 8. Exponential Data Size Growth 8
  9. 9. Powerful CPUs High Performance Multiple Cores 9
  10. 10. Can Compress and Decompress Fast! •Up to 1GB/sec compression •Up to 2GB/sec decompression Snappy, LZ4 10
  11. 11. Flash Disk space is more costly than for HDDs Write Endurance is expensive Want to write less data Decent at handling fragmentation 11
  12. 12. Cloud Pay for Space Pay for IOPS More limited Storage Performance Network Performance may be limited 12
  13. 13. Data we store in Databases •Text •JSON •XML •Time Series Data •Log Files Modern Data Compresses Well! 13
  14. 14. COMPRESSION BASICS Introduction into a ways of making your data smaller 14
  15. 15. Lossy and Lossless Database generally use Lossless Compression Lossy compression done on the application level 15
  16. 16. Some ways of getting data smaller Layout Optimizations “Encoding” Dictionary Compression Block Compression 16
  17. 17. Layout Optimizations Column Store versus Row Store Hybrid Formats Variable Block Sizes 17
  18. 18. Encoding Depends on Data Type and Domain Delta Encoding, Run Length Encoding (RLE) Can be faster than read of uncompressed data UTF8 (strings) and VLQ (Integers) Index Prefix Compression 18
  19. 19. Dictionary Compression Replacing frequent values with Dictionary Pointers Kind of like STL String ENUM type in MySQL 19
  20. 20. Block Compression Compress “block” of data so it is smaller for storage Finding Patterns in Data and Efficiently encoding them Many Algorithms Exist: Snappy, Zlib, LZ4, LZMA 20
  21. 21. Block Compression Details Compression rate highly depends on data Compression rate depends on block size Speed depends on block size and data 21
  22. 22. Block Size Dependence (by Leif Walsh) 22
  23. 23. There is no one size fits all Typically Compression Algorithm can be selected Often with additional settings 23
  24. 24. WHERE AND HOW Where do we compress data and how do we do that 24
  25. 25. Where to Compress Data In Memory ? In the Database Data Store ? As Part of File System ? Storage Hardware ? Application ? 25
  26. 26. Compression in Memory Reduce amount of memory needed for same working set Reduce IO for Fixed amount of Memory Typically in-Memory Performance Hit Encoding/Dictionary Compression are good fit 26
  27. 27. Database Data Store Reduce Database Size on Disk Works with all file systems and storage With OS cache can be used as In- Memory compression variant Dealing with fragmentation is common issue 27
  28. 28. Compression on File System Level Works with all Databases/Storage Engines Performance Impact can be significant Logical Space on disk is not reduced ZFS 28
  29. 29. Compression on Storage Hardware Hardware Dependent Does not reduce space on disk Can result in Performance Gains rather than free space (SSD) Can become a choke point 29
  30. 30. By Application No Database Support needed Reduce Database Load and Network Traffic Application may know more about data More Complexity Give up many DBMS features (search, index) 30
  31. 31. DESIGN CONSIDERATIONS What makes database system to do well with compression 31
  32. 32. The Goal Minimize Negative Impact for User Operations (Reads and Writes) 32
  33. 33. Design Principles Fast Decompression Compression in Background Parallel Compression/Decompression Reduce need of Re- Compression on Update 33
  34. 34. Choosing Block Size Large Blocks •Most efficient for compression •Bulky Read Writes Small Blocks •Fastest to Decompress •Best for point lookups 34
  35. 35. IMPLEMENTATION EXAMPLES What Database systems Really do with Compression 35
  36. 36. MySQL “Packed” MyISAM Compress table “offline” with myisampack Table Becomes Read Only Variety of compression methods are used Only data is compressed, not indexes Note MyISAM support index prefix compression for all indexes 36
  37. 37. MySQL Archive Storage Engine Does not support indexes Essentially file of rows with sequential access Uses zlib compression 37
  38. 38. Innodb Table Compression Available Since MySQL 5.1 Pages compressed using zlib Compressed page target (1K, 4K, 8K) has to be set Both Compressed and Uncompressed pages can be cached in Buffer Pool Per Page “log” of changes to avoid recompression Extenrally Stored BLOBs are compressed as single entity 38
  39. 39. Innodb Transparent Page Compression Available in MySQL 5.7 Zib and LZ4 Compression Compresses pages as they are written to disk Free space on the page is given back using “hole punching” Originally designed to work with FusionIO NVMFS Can cause problems for current filesystem due to very high hole number 39
  40. 40. Disk usage (Linkbench data set by Sunny Bains) 40
  41. 41. Performance on Fast SSD (FusionIO NVMFS) 41
  42. 42. Results on Slower SSD (Intel 730*2, EXT4) 42
  43. 43. Fractal Trees Compression Available as Storage Engine for MySQL and MongoDB Can use many compression libraries Tunable Compression Block Size Reduce Re-Compression due to message buffering 43
  44. 44. Can get a lot of compression 44
  45. 45. MongoDB WiredTiger Storage Engine Engine Has many compression settings Indexes are using Index Prefix Compression Data Pages can be compressed using zlib or Snappy 45
  46. 46. Compression Size (results by Asya Kamsky) 46
  47. 47. Compression in RocksDB RocksDB – LSM Based Storage Engine for MongoDB and MySQL LSM works very well with compression Supports, zlib, lz4, bzip2 compression Can use different compression methods for different Levels in LSM 47
  48. 48. Compression results from Mike Kania 48
  49. 49. PostgreSQL Uses compression by default with TOAST 2KB (default) or longer Strings, BlOBs Unlike Innodb External Storage is not required for Compression Recommended to use File system compression ie ZFS if Compression is Desired 49
  50. 50. Summary Compression is Important in Modern Age Consider it for your system Many different techniques are used to make data smaller by databases Compression support is rapidly changing and improving 50
  51. 51. Want More ? I’m talking about MySQL Replication Options Free (as in Beer) Moscow MySQL Users Group meetup November 6th, Hosted by 51
  52. 52. Percona Live 2016 call for paper is Open Call for Papers Open until November 29, 2016 MySQL, MongoDB, NoSQL, Data in The Cloud Anything to make Data Happy! 52
  53. 53. 53 Thank You! Peter Zaitsev