Compression in Open Source
Databases
Peter Zaitsev
CEO, Percona
MySQL Central @ OOW
26 Oct 2015
Few Words About Percona
Your Partner in
MySQL and
MongoDB Success
100% Open Source
Software
“No Lock in
Required”
Solutions and
Services
We work with
MySQL, MariaDB,
MongoDB, Amazon
RDS and Aurora
2
About the Talk
A bit of the History
Approaches to Data Compression
What some of the popular systems
implement
3
Lets Define The Term
Compression - Any
Technique to make
data size smaller
4
A bit of History
Early Computers were too slow to compress
data in Software
Hardware Encryption (ie Tape)
Compression first appears for non
performance critical data
5
We did not need it much for space…
6
Welcome to the modern age
Data Growth
outpaces HDD
improvements
Powerful CPUs Flash
Cloud
Data we store
now
7
Exponential Data Size Growth
8
Powerful CPUs
High
Performance
Multiple
Cores
9
Can Compress and Decompress Fast!
•Up to 1GB/sec
compression
•Up to 2GB/sec
decompression
Snappy,
LZ4
10
Flash
Disk space is more costly than for HDDs
Write Endurance is expensive
Want to write less data
Decent at handling fragmentation
11
Cloud
Pay for Space Pay for IOPS
More limited
Storage
Performance
Network
Performance
may be limited
12
Data we store in Databases
•Text
•JSON
•XML
•Time Series Data
•Log Files
Modern
Data
Compresses
Well!
13
COMPRESSION BASICS
Introduction into a ways of making your data smaller
14
Lossy and Lossless
Database generally use
Lossless Compression
Lossy compression done on
the application level
15
Some ways of getting data smaller
Layout
Optimizations
“Encoding”
Dictionary
Compression
Block
Compression
16
Layout Optimizations
Column Store versus Row Store
Hybrid Formats
Variable Block Sizes
17
Encoding
Depends on Data Type and Domain
Delta Encoding, Run Length Encoding (RLE)
Can be faster than read of uncompressed data
UTF8 (strings) and VLQ (Integers)
Index Prefix Compression
18
Dictionary Compression
Replacing frequent
values with
Dictionary Pointers
Kind of like STL
String
ENUM type in
MySQL
19
Block Compression
Compress “block” of data so it is smaller for
storage
Finding Patterns in Data and Efficiently
encoding them
Many Algorithms Exist: Snappy, Zlib, LZ4,
LZMA
20
Block Compression Details
Compression rate highly depends on data
Compression rate depends on block size
Speed depends on block size and data
21
Block Size Dependence (by Leif Walsh)
22
There is no one size fits all
Typically Compression
Algorithm can be selected
Often with additional
settings
23
WHERE AND HOW
Where do we compress data and how do we do that
24
Where to Compress Data
In Memory ?
In the Database Data Store ?
As Part of File System ?
Storage Hardware ?
Application ?
25
Compression in Memory
Reduce amount of
memory needed for
same working set
Reduce IO for Fixed
amount of Memory
Typically in-Memory
Performance Hit
Encoding/Dictionary
Compression are
good fit
26
Database Data Store
Reduce Database
Size on Disk
Works with all file
systems and storage
With OS cache can
be used as In-
Memory
compression variant
Dealing with
fragmentation is
common issue
27
Compression on File System Level
Works with all
Databases/Storage
Engines
Performance
Impact can be
significant
Logical Space on
disk is not reduced
ZFS
28
Compression on Storage Hardware
Hardware
Dependent
Does not reduce
space on disk
Can result in
Performance Gains
rather than free
space (SSD)
Can become a
choke point
29
By Application
No Database Support needed
Reduce Database Load and Network Traffic
Application may know more about data
More Complexity
Give up many DBMS features (search, index)
30
DESIGN CONSIDERATIONS
What makes database system to do well with compression
31
The Goal
Minimize Negative
Impact for User
Operations (Reads and
Writes)
32
Design Principles
Fast Decompression Compression in Background
Parallel
Compression/Decompression
Reduce need of Re-
Compression on Update
33
Choosing Block Size
Large Blocks
•Most efficient
for compression
•Bulky Read
Writes
Small Blocks
•Fastest to
Decompress
•Best for point
lookups
34
IMPLEMENTATION EXAMPLES
What Database systems Really do with Compression
35
MySQL “Packed” MyISAM
Compress table “offline” with myisampack
Table Becomes Read Only
Variety of compression methods are used
Only data is compressed, not indexes
Note MyISAM support index prefix compression for all indexes
36
MySQL Archive Storage Engine
Does not support indexes
Essentially file of rows with sequential
access
Uses zlib compression
37
Innodb Table Compression
Available Since MySQL 5.1
Pages compressed using zlib
Compressed page target (1K, 4K, 8K) has to be set
Both Compressed and Uncompressed pages can be cached in Buffer Pool
Per Page “log” of changes to avoid recompression
Extenrally Stored BLOBs are compressed as single entity
38
Innodb Transparent Page Compression
Available in MySQL 5.7
Zib and LZ4 Compression
Compresses pages as they are written to disk
Free space on the page is given back using “hole punching”
Originally designed to work with FusionIO NVMFS
Can cause problems for current filesystem due to very high hole number
39
Disk usage (Linkbench data set by Sunny Bains)
40
Performance on Fast SSD (FusionIO NVMFS)
41
Results on Slower SSD (Intel 730*2, EXT4)
42
Fractal Trees Compression
Available as Storage Engine for MySQL and MongoDB
Can use many compression libraries
Tunable Compression Block Size
Reduce Re-Compression due to message buffering
43
Can get a lot of compression
44
MongoDB WiredTiger Storage Engine
Engine Has many compression settings
Indexes are using Index Prefix Compression
Data Pages can be compressed using zlib or
Snappy
45
Compression Size (results by Asya Kamsky)
46
Compression in RocksDB
RocksDB – LSM Based Storage Engine for MongoDB and
MySQL
LSM works very well with compression
Supports, zlib, lz4, bzip2 compression
Can use different compression methods for different Levels
in LSM
47
Compression results from Mike Kania
48
PostgreSQL
Uses compression by default with TOAST
2KB (default) or longer Strings, BlOBs
Unlike Innodb External Storage is not required for
Compression
Recommended to use File system compression ie ZFS if
Compression is Desired
49
Summary
Compression is Important in Modern Age
Consider it for your system
Many different techniques are used to make data smaller
by databases
Compression support is rapidly changing and improving
50
Want More ?
I’m talking about MySQL Replication Options
Free (as in Beer) Moscow MySQL Users Group meetup
November 6th,
Hosted by Mail.ru
http://www.meetup.com/moscowmysql/
51
Percona Live 2016 call for paper is Open
Call for Papers Open until November 29, 2016
MySQL, MongoDB, NoSQL, Data in The Cloud
Anything to make Data Happy!
http://bit.ly/PL16Call
52
53
Thank You!
Peter Zaitsev
pz@percona.com
https://www.linkedin.com/in/peterzaitsev

Ужимай и властвуй алгоритмы компрессии в базах данных / Петр Зайцев (Percona)

  • 1.
    Compression in OpenSource Databases Peter Zaitsev CEO, Percona MySQL Central @ OOW 26 Oct 2015
  • 2.
    Few Words AboutPercona Your Partner in MySQL and MongoDB Success 100% Open Source Software “No Lock in Required” Solutions and Services We work with MySQL, MariaDB, MongoDB, Amazon RDS and Aurora 2
  • 3.
    About the Talk Abit of the History Approaches to Data Compression What some of the popular systems implement 3
  • 4.
    Lets Define TheTerm Compression - Any Technique to make data size smaller 4
  • 5.
    A bit ofHistory Early Computers were too slow to compress data in Software Hardware Encryption (ie Tape) Compression first appears for non performance critical data 5
  • 6.
    We did notneed it much for space… 6
  • 7.
    Welcome to themodern age Data Growth outpaces HDD improvements Powerful CPUs Flash Cloud Data we store now 7
  • 8.
  • 9.
  • 10.
    Can Compress andDecompress Fast! •Up to 1GB/sec compression •Up to 2GB/sec decompression Snappy, LZ4 10
  • 11.
    Flash Disk space ismore costly than for HDDs Write Endurance is expensive Want to write less data Decent at handling fragmentation 11
  • 12.
    Cloud Pay for SpacePay for IOPS More limited Storage Performance Network Performance may be limited 12
  • 13.
    Data we storein Databases •Text •JSON •XML •Time Series Data •Log Files Modern Data Compresses Well! 13
  • 14.
    COMPRESSION BASICS Introduction intoa ways of making your data smaller 14
  • 15.
    Lossy and Lossless Databasegenerally use Lossless Compression Lossy compression done on the application level 15
  • 16.
    Some ways ofgetting data smaller Layout Optimizations “Encoding” Dictionary Compression Block Compression 16
  • 17.
    Layout Optimizations Column Storeversus Row Store Hybrid Formats Variable Block Sizes 17
  • 18.
    Encoding Depends on DataType and Domain Delta Encoding, Run Length Encoding (RLE) Can be faster than read of uncompressed data UTF8 (strings) and VLQ (Integers) Index Prefix Compression 18
  • 19.
    Dictionary Compression Replacing frequent valueswith Dictionary Pointers Kind of like STL String ENUM type in MySQL 19
  • 20.
    Block Compression Compress “block”of data so it is smaller for storage Finding Patterns in Data and Efficiently encoding them Many Algorithms Exist: Snappy, Zlib, LZ4, LZMA 20
  • 21.
    Block Compression Details Compressionrate highly depends on data Compression rate depends on block size Speed depends on block size and data 21
  • 22.
    Block Size Dependence(by Leif Walsh) 22
  • 23.
    There is noone size fits all Typically Compression Algorithm can be selected Often with additional settings 23
  • 24.
    WHERE AND HOW Wheredo we compress data and how do we do that 24
  • 25.
    Where to CompressData In Memory ? In the Database Data Store ? As Part of File System ? Storage Hardware ? Application ? 25
  • 26.
    Compression in Memory Reduceamount of memory needed for same working set Reduce IO for Fixed amount of Memory Typically in-Memory Performance Hit Encoding/Dictionary Compression are good fit 26
  • 27.
    Database Data Store ReduceDatabase Size on Disk Works with all file systems and storage With OS cache can be used as In- Memory compression variant Dealing with fragmentation is common issue 27
  • 28.
    Compression on FileSystem Level Works with all Databases/Storage Engines Performance Impact can be significant Logical Space on disk is not reduced ZFS 28
  • 29.
    Compression on StorageHardware Hardware Dependent Does not reduce space on disk Can result in Performance Gains rather than free space (SSD) Can become a choke point 29
  • 30.
    By Application No DatabaseSupport needed Reduce Database Load and Network Traffic Application may know more about data More Complexity Give up many DBMS features (search, index) 30
  • 31.
    DESIGN CONSIDERATIONS What makesdatabase system to do well with compression 31
  • 32.
    The Goal Minimize Negative Impactfor User Operations (Reads and Writes) 32
  • 33.
    Design Principles Fast DecompressionCompression in Background Parallel Compression/Decompression Reduce need of Re- Compression on Update 33
  • 34.
    Choosing Block Size LargeBlocks •Most efficient for compression •Bulky Read Writes Small Blocks •Fastest to Decompress •Best for point lookups 34
  • 35.
    IMPLEMENTATION EXAMPLES What Databasesystems Really do with Compression 35
  • 36.
    MySQL “Packed” MyISAM Compresstable “offline” with myisampack Table Becomes Read Only Variety of compression methods are used Only data is compressed, not indexes Note MyISAM support index prefix compression for all indexes 36
  • 37.
    MySQL Archive StorageEngine Does not support indexes Essentially file of rows with sequential access Uses zlib compression 37
  • 38.
    Innodb Table Compression AvailableSince MySQL 5.1 Pages compressed using zlib Compressed page target (1K, 4K, 8K) has to be set Both Compressed and Uncompressed pages can be cached in Buffer Pool Per Page “log” of changes to avoid recompression Extenrally Stored BLOBs are compressed as single entity 38
  • 39.
    Innodb Transparent PageCompression Available in MySQL 5.7 Zib and LZ4 Compression Compresses pages as they are written to disk Free space on the page is given back using “hole punching” Originally designed to work with FusionIO NVMFS Can cause problems for current filesystem due to very high hole number 39
  • 40.
    Disk usage (Linkbenchdata set by Sunny Bains) 40
  • 41.
    Performance on FastSSD (FusionIO NVMFS) 41
  • 42.
    Results on SlowerSSD (Intel 730*2, EXT4) 42
  • 43.
    Fractal Trees Compression Availableas Storage Engine for MySQL and MongoDB Can use many compression libraries Tunable Compression Block Size Reduce Re-Compression due to message buffering 43
  • 44.
    Can get alot of compression 44
  • 45.
    MongoDB WiredTiger StorageEngine Engine Has many compression settings Indexes are using Index Prefix Compression Data Pages can be compressed using zlib or Snappy 45
  • 46.
    Compression Size (resultsby Asya Kamsky) 46
  • 47.
    Compression in RocksDB RocksDB– LSM Based Storage Engine for MongoDB and MySQL LSM works very well with compression Supports, zlib, lz4, bzip2 compression Can use different compression methods for different Levels in LSM 47
  • 48.
  • 49.
    PostgreSQL Uses compression bydefault with TOAST 2KB (default) or longer Strings, BlOBs Unlike Innodb External Storage is not required for Compression Recommended to use File system compression ie ZFS if Compression is Desired 49
  • 50.
    Summary Compression is Importantin Modern Age Consider it for your system Many different techniques are used to make data smaller by databases Compression support is rapidly changing and improving 50
  • 51.
    Want More ? I’mtalking about MySQL Replication Options Free (as in Beer) Moscow MySQL Users Group meetup November 6th, Hosted by Mail.ru http://www.meetup.com/moscowmysql/ 51
  • 52.
    Percona Live 2016call for paper is Open Call for Papers Open until November 29, 2016 MySQL, MongoDB, NoSQL, Data in The Cloud Anything to make Data Happy! http://bit.ly/PL16Call 52
  • 53.