MyRocks in MariaDB
Sergei Petrunia
sergey@mariadb.com
- What is MyRocks
- MyRocks’ way into MariaDB
- Using MyRocks
3
What is MyRocks
● InnoDB’s limitations
● LSM Trees
● RocksDB + MyRocks
4
Data vs Disk

Put some data into the database

How much is written to disk?
INSERT INTO tbl1 VALUES ('foo')

amplification =
size of data
size of data on disk
5
Amplification

Write amplification

Space amplification

Read amplification
amplification =
size of data
size of data on disk
foo
6
Amplification in InnoDB
● B*-tree
● Read amplification
– Assume random data lookup
– Locate the page, read it
– Root page is cached
● ~1 disk read for the leaf page
– Read amplification is ok
7
Write amplification in InnoDB
● Locate the page to update
● Read; modify; write
– Write more if we caused a page split
● One page write per update!
– page_size / sizeof(int) = 16K/8
● Write amplification is an issue.
8
Space amplification in InnoDB
● Page fill factor <100%
– to allow for updates
● Compression is done per-page
– Compressing bigger portions would be
better
– Page alignment
● Compressed to 3K ? Still on 4k page.
● => Space amplification is an issue
9
InnoDB amplification summary
● Read amplification is ok
● Write and space amplification is an issue
– Saturated IO
– SSDs wear out faster
– Need more space on SSD
● => Low storage efficiency
10
What is MyRocks
● InnoDB’s limitations
● LSM Trees
● RocksDB + MyRocks
11
Log-structured merge tree
● Writes go to
– MemTable
– Log (for durablility)
● When MemTable is full, it is
flushed to disk
● SST= SortedStringTable
– Is written sequentially
– Allows for lookups
MemTableWrite
Log SST
MemTable
12
Log-structured merge tree
● Writing produces
more SSTs
● SSTs are immutable
● SSTs may have
multiple versions of
data
MemTableWrite
Log SST
MemTable
SST ...
13
Reads in LSM tree
● Need to merge the
data on read
– Read amplification
suffers
● Should not have too
many SSTs.
MemTable Read
Log SST
MemTable
SST ...SST
14
Compaction
● Merges multiple SSTs into one
● Removes old data versions
● Reduces the number of files
● Write amplification++
SST SST . . .SST
SST
15
LSM Tree summary
● LSM architecture
– Data is stored in the log, then SST files
– Writes to SST files are sequential
– Have to look in several SST files to read the data
● Efficiency
– Write amplification is reduced
– Space amplification is reduced
– Read amplification increases
16
What is MyRocks
● InnoDB’s limitations
● LSM Trees
● RocksDB + MyRocks
17
RocksDB
● “An embeddable key-value store for
fast storage environments”
● LSM architecture
● Developed at Facebook (2012)
● Used at Facebook and many other companies
● It’s a key-value store, not a database
– C++ Library (local only)
– NoSQL-like API (Get, Put, Delete, Scan)
– No datatypes, tables, secondary indexes, ...
18
MyRocks
● A MySQL storage engine
● Uses RocksDB for storage
● Implements a MySQL storage engine
on top
– Secondary indexes
– Data types
– SQL transactions
– …
● Developed* and used by Facebook
– *-- with some MariaDB involvement
MyRocks
RocksDB
MySQL
InnoDB
19
Size amplification benchmark
● Benchmark data and
chart from Facebook
● Linkbench run
● 24 hours
20
Write amplification benchmark
● Benchmark data and
chart from Facebook
● Linkbench
21
QPS
● Benchmark data and
chart from Facebook
● Linkbench
22
QPS on in-memory workload
● Benchmark data and
chart from Facebook
● Sysbench read-write,
in-memory
● MyRocks doesn’t
always beat InnoDB
23
QPS
● Benchmark data and
chart from Facebook
● Linkbench
24
Another write amplification test
● InnoDB
vs
● MyRocks serving 2x
data
InnoDB 2x RocksDB
0
4
8
12
Flash GC
Binlog / Relay log
Storage engine
25
A real efficiency test
For a certain workload we could reduce the number of MySQL
servers by 50%
– Yoshinori Matsunobu
- What is MyRocks
- MyRocks’ way into MariaDB
- Using MyRocks
27
The source of MyRocks
● MyRocks is a part of Facebook’s MySQL tree
– github.com/facebook/mysql-5.6
● No binaries or packages
● No releases
● Extra features
● “In-house” experience
28
MyRocks in MariaDB
● MyRocks is a part of MariaDB 10.2 and MariaDB 10.3
– MyRocks itself is the same across 10.2 and 10.3
– New related feature in 10.3: “Per-engine gtid_slave_pos”
● MyRocks is a loadable plugin with its own Maturity level.
● Releases
– April, 2017: MyRocks is Alpha (in MariaDB 10.2.5 RC)
– January, 2018: MyRocks is Beta
– February, 2018: MyRocks is RC
● Release pending
- What is MyRocks
- MyRocks’ way into MariaDB
- Using MyRocks
30
Using MyRocks
● Installation
● Migration
● Tuning
● Replication
● Backup
31
MyRocks packages and binaries
Packages are available for modern distributions
● RPM-based (RedHat, CentOS)
● Deb-based (Debian, Ubuntu)
● Binary tarball
– mariadb-10.2.12-linux-systemd-x86_64.tar.gz
● Windows
● ...
32
Installing MyRocks
● Install the plugin
– RPM
– DEB
– Bintar/windows:
apt install mariadb-plugin-rocksdb
MariaDB> install soname 'ha_rocksdb';
MariaDB> create table t1( … ) engine=rocksdb;
yum install MariaDB-rocksdb-engine
● Restart the server
● Check
● Use
MariaDB> SHOW PLUGINS;
33
Using MyRocks
● Installation
● Migration
● Tuning
● Replication
● Backup
34
Data loading – good news
● It’s a write-optimized storage engine
● The same data in MyRocks takes less space on disk
than on InnoDB
– 1.5x, 3x, or more
● Can load the data faster than InnoDB
● But…
35
Data loading – limitations
● Limitation: Transaction must fit in memory
mysql> ALTER TABLE big_table ENGINE=RocksDB;
ERROR 2013 (HY000): Lost connection to MySQL server during query
● Uses a lot of memory, then killed by OOM killer.
● Need to use special settings for loading data
– https://mariadb.com/kb/en/library/loading-data-into-myrocks/
– https://github.com/facebook/mysql-5.6/wiki/data-loading
36
Sorted bulk loading
MariaDB> set rocksdb_bulk_load=1;
... # something that inserts the data in PK order
MariaDB> set rocksdb_bulk_load=0;
Bypasses
● Transactional locking
● Seeing the data you are inserting
● Compactions
● ...
37
Unsorted bulk loading
MariaDB> set rocksdb_bulk_load_allow_unsorted=0;
MariaDB> set rocksdb_bulk_load=1;
... # something that inserts data in any order
MariaDB> set rocksdb_bulk_load=0;
● Same as previous but doesn’t require PK order
38
Other speedups for lading
● rocksdb_commit_in_the_middle
– Auto commit every rocksdb_bulk_load_size rows
● @@unique_checks
– Setting to FALSE will bypass unique checks
39
Creating indexes
● Index creation doesn’t need any special settings
mysql> ALTER TABLE myrocks_table ADD INDEX(...);
● Compression is enabled by default
– Lots of settings to tune it further
40
max_row_locks as a safety setting
● Avoid run-away memory usage and OOM killer:
MariaDB> set rocksdb_max_row_locks=10000;
MariaDB> alter table t1 engine=rocksdb;
ERROR 4067 (HY000): Status error 10 received from RocksDB:
Operation aborted: Failed to acquire lock due to
max_num_locks limit
● This setting is useful after migration, too.
41
Using MyRocks
● Installation
● Migration
● Tuning
● Replication
● Backup
42
Essential MyRocks tuning
● rocksdb_flush_log_at_trx_commit=1
– Same as innodb_flush_log_at_trx_commit=1
– Together with sync_binlog=1 makes the master crash-
safe
● rocksdb_block_cache_size=...
– Similar to innodb_buffer_pool_size
– 500Mb by default (will use up to that amount * 1.5?)
– Set higher if have more RAM
● Safety: rocksdb_max_row_locks=...
43
Indexes
● Primary Key is the clustered index (like in InnoDB)
● Secondary indexes include PK columns (like in InnoDB)
● Non-unique secondary indexes are cheap
– Index maintenance is read-free
● Unique secondary indexes are more expensive
– Maintenance is not read-free
– Reads can suffer from read amplification
– Can use @@unique_check=false, at your own risk
44
Charsets and collations
● Index entries are compared with memcmp(), “mem-comparable”
● “Reversible collations”: binary, latin1_bin, utf8_bin
– Can convert values to/from their mem-comparable form
● “Restorable collations”
– 1-byte characters, one weght per character
– e.g. latin1_general_ci, latin1_swedish_ci, ...
– Index stores mem-comparable form + restore data
● Other collations (e.g. utf8_general_ci)
– Index-only scans are not supported
– Table row stores the original value
‘A’ a 1
‘a’ a 0
45
Charsets and collations (2)
● Using indexes on “Other” (collation) produces a warning
MariaDB> create table t3 (...
a varchar(100) collate utf8_general_ci, key(a)) engine=rocksdb;
Query OK, 0 rows affected, 1 warning (0.20 sec)
MariaDB> show warningsG
*************************** 1. row ***************************
Level: Warning
Code: 1815
Message: Internal error: Indexed column test.t3.a uses a collation that
does not allow index-only access in secondary key and has reduced disk
space efficiency in primary key.
46
Using MyRocks
● Installing
● Migration
● Tuning
● Replication
● Backup
47
MyRocks only supports RBR
● MyRocks’ highest isolation level is “snapshot isolation”,
that is REPEATABLE-READ
● Statement-Based Replication
– Slave runs the statements sequentially (~ serializable)
– InnoDB uses “Gap Locks” to provide isolation req’d by SBR
– MyRocks doesn’t support Gap Locks
● Row-Based Replication must be used
– binlog_format=row
48
Parallel replication
● Facebook/MySQL-5.6 is based on MySQL 5.6
– Parallel replication for data in different databases
● MariaDB has more advanced parallel slave
(controlled by @@slave_parallel_mode)
– Conservative (group-commit based)
– Optimistic (apply transactions in parallel, on conflict roll
back the transaction with greater seq_no)
– Aggressive
49
Parallel replication (2)
● Conservative parallel
replication works with MyRocks
– Different execution paths for
log_slave_updates=ON|OFF
– ON might be faster
● Optimistic and aggressive do
not provide speedups over
conservative.
50
mysql.gtid_slave_pos
●
mysql.gtid_slave_pos stores slave’s position
● Use a transactional storage engine for it
– Database contents will match slave position after crash
recovery
– Makes the slave crash-safe.
●
mysql.gtid_slave_pos uses a different engine?
– Cross-engine transaction (slow).
51
Per-engine mysql.gtid_slave_pos
● The idea:
– Have mysql.gtid_slave_pos_${engine} for each
storage engine
– Slave position is the biggest position in all tables.
– Transaction affecting only MyRocks will only update
mysql.gtid_slave_pos_rocksdb
● Configuration:
--gtid-pos-auto-engines=engine1,engine2,...
52
Per-engine mysql.gtid_slave_pos
● Available in MariaDB 10.3
● Thanks for the patch to
– Kristian Nielsen (implementation)
– Booking.com (request and funding)
● Don’t let your slave be 2-3x slower:
– 10.3: my.cnf: gtid-pos-auto-engines=INNODB,ROCKSDB
– 10.2: MariaDB> ALTER TABLE mysql.gtid_slave_pos
ENGINE=RocksDB;
53
Replication takeaways
● Must use binlog_format=row
● Parallel replication works
(slave_parallel_mode=conservative)
● Set your mysql.gtid_slave_pos correctly
– 10.2: preferred engine
– 10.3: per-engine
54
Using MyRocks
● Installation
● Migration
● Tuning
● Replication
● Backup
55
myrocks_hotbackup tool
● Takes a hot (non-blocking) backup of RocksDB data
mkdir /path/to/checkpoint-dir
myrocks_hotbackup --user=... --port=... 
--checkpoint_dir=/path/to/checkpoint-dir | ssh "tar -xi ..."
myrocks_hotbackup --move_back 
--datadir=/path/to/restore-dir 
--rocksdb_datadir=/path/to/restore-dir/#rocksdb 
--rocksdb_waldir= /path/to/restore-dir/#rocksdb 
--backup_dir=/where/tarball/was/unpacked
● Making a backup
● Starting from backup
56
myrocks_hotbackup limitations
● Targets MyRocks-only installs
– Copies MyRocks data and supplementary files
– Does *not* work for mixed MyRocks+InnoDB
setups
● Assumes no DDL is running concurrently with the
backup
– May get a wrong frm file
● Not very user-friendly
57
Further backup considerations
● mariabackup does not work with MyRocks
currently
● myrocks_hotbackup is similar (actually simpler) than
xtrabackup
● Will look at making mariabackup work with MyRocks
58
Using MyRocks
● Installation
● Migration
● Tuning
● Replication
● Backup
✔
✔
✔
✔
✔
59
Documentation
● https://mariadb.com/kb/en/library/myrocks/
● https://github.com/facebook/mysql-5.6/wiki
● https://mariadb.com/kb/en/library/differences-
between-myrocks-variants/
60
Conclusions
● MyRocks is a write-optimized storage engine based
on LSM-trees
– Developed and used by Facebook
● Available in MariaDB 10.2 and MariaDB 10.3
– Currently RC, Stable very soon
● It has some limitations and requires some tuning
– Easy to tackle if you were here for this talk :-)
Thanks!
Q&A

M|18 How to use MyRocks with MariaDB Server

  • 1.
    MyRocks in MariaDB SergeiPetrunia sergey@mariadb.com
  • 2.
    - What isMyRocks - MyRocks’ way into MariaDB - Using MyRocks
  • 3.
    3 What is MyRocks ●InnoDB’s limitations ● LSM Trees ● RocksDB + MyRocks
  • 4.
    4 Data vs Disk  Putsome data into the database  How much is written to disk? INSERT INTO tbl1 VALUES ('foo')  amplification = size of data size of data on disk
  • 5.
    5 Amplification  Write amplification  Space amplification  Readamplification amplification = size of data size of data on disk foo
  • 6.
    6 Amplification in InnoDB ●B*-tree ● Read amplification – Assume random data lookup – Locate the page, read it – Root page is cached ● ~1 disk read for the leaf page – Read amplification is ok
  • 7.
    7 Write amplification inInnoDB ● Locate the page to update ● Read; modify; write – Write more if we caused a page split ● One page write per update! – page_size / sizeof(int) = 16K/8 ● Write amplification is an issue.
  • 8.
    8 Space amplification inInnoDB ● Page fill factor <100% – to allow for updates ● Compression is done per-page – Compressing bigger portions would be better – Page alignment ● Compressed to 3K ? Still on 4k page. ● => Space amplification is an issue
  • 9.
    9 InnoDB amplification summary ●Read amplification is ok ● Write and space amplification is an issue – Saturated IO – SSDs wear out faster – Need more space on SSD ● => Low storage efficiency
  • 10.
    10 What is MyRocks ●InnoDB’s limitations ● LSM Trees ● RocksDB + MyRocks
  • 11.
    11 Log-structured merge tree ●Writes go to – MemTable – Log (for durablility) ● When MemTable is full, it is flushed to disk ● SST= SortedStringTable – Is written sequentially – Allows for lookups MemTableWrite Log SST MemTable
  • 12.
    12 Log-structured merge tree ●Writing produces more SSTs ● SSTs are immutable ● SSTs may have multiple versions of data MemTableWrite Log SST MemTable SST ...
  • 13.
    13 Reads in LSMtree ● Need to merge the data on read – Read amplification suffers ● Should not have too many SSTs. MemTable Read Log SST MemTable SST ...SST
  • 14.
    14 Compaction ● Merges multipleSSTs into one ● Removes old data versions ● Reduces the number of files ● Write amplification++ SST SST . . .SST SST
  • 15.
    15 LSM Tree summary ●LSM architecture – Data is stored in the log, then SST files – Writes to SST files are sequential – Have to look in several SST files to read the data ● Efficiency – Write amplification is reduced – Space amplification is reduced – Read amplification increases
  • 16.
    16 What is MyRocks ●InnoDB’s limitations ● LSM Trees ● RocksDB + MyRocks
  • 17.
    17 RocksDB ● “An embeddablekey-value store for fast storage environments” ● LSM architecture ● Developed at Facebook (2012) ● Used at Facebook and many other companies ● It’s a key-value store, not a database – C++ Library (local only) – NoSQL-like API (Get, Put, Delete, Scan) – No datatypes, tables, secondary indexes, ...
  • 18.
    18 MyRocks ● A MySQLstorage engine ● Uses RocksDB for storage ● Implements a MySQL storage engine on top – Secondary indexes – Data types – SQL transactions – … ● Developed* and used by Facebook – *-- with some MariaDB involvement MyRocks RocksDB MySQL InnoDB
  • 19.
    19 Size amplification benchmark ●Benchmark data and chart from Facebook ● Linkbench run ● 24 hours
  • 20.
    20 Write amplification benchmark ●Benchmark data and chart from Facebook ● Linkbench
  • 21.
    21 QPS ● Benchmark dataand chart from Facebook ● Linkbench
  • 22.
    22 QPS on in-memoryworkload ● Benchmark data and chart from Facebook ● Sysbench read-write, in-memory ● MyRocks doesn’t always beat InnoDB
  • 23.
    23 QPS ● Benchmark dataand chart from Facebook ● Linkbench
  • 24.
    24 Another write amplificationtest ● InnoDB vs ● MyRocks serving 2x data InnoDB 2x RocksDB 0 4 8 12 Flash GC Binlog / Relay log Storage engine
  • 25.
    25 A real efficiencytest For a certain workload we could reduce the number of MySQL servers by 50% – Yoshinori Matsunobu
  • 26.
    - What isMyRocks - MyRocks’ way into MariaDB - Using MyRocks
  • 27.
    27 The source ofMyRocks ● MyRocks is a part of Facebook’s MySQL tree – github.com/facebook/mysql-5.6 ● No binaries or packages ● No releases ● Extra features ● “In-house” experience
  • 28.
    28 MyRocks in MariaDB ●MyRocks is a part of MariaDB 10.2 and MariaDB 10.3 – MyRocks itself is the same across 10.2 and 10.3 – New related feature in 10.3: “Per-engine gtid_slave_pos” ● MyRocks is a loadable plugin with its own Maturity level. ● Releases – April, 2017: MyRocks is Alpha (in MariaDB 10.2.5 RC) – January, 2018: MyRocks is Beta – February, 2018: MyRocks is RC ● Release pending
  • 29.
    - What isMyRocks - MyRocks’ way into MariaDB - Using MyRocks
  • 30.
    30 Using MyRocks ● Installation ●Migration ● Tuning ● Replication ● Backup
  • 31.
    31 MyRocks packages andbinaries Packages are available for modern distributions ● RPM-based (RedHat, CentOS) ● Deb-based (Debian, Ubuntu) ● Binary tarball – mariadb-10.2.12-linux-systemd-x86_64.tar.gz ● Windows ● ...
  • 32.
    32 Installing MyRocks ● Installthe plugin – RPM – DEB – Bintar/windows: apt install mariadb-plugin-rocksdb MariaDB> install soname 'ha_rocksdb'; MariaDB> create table t1( … ) engine=rocksdb; yum install MariaDB-rocksdb-engine ● Restart the server ● Check ● Use MariaDB> SHOW PLUGINS;
  • 33.
    33 Using MyRocks ● Installation ●Migration ● Tuning ● Replication ● Backup
  • 34.
    34 Data loading –good news ● It’s a write-optimized storage engine ● The same data in MyRocks takes less space on disk than on InnoDB – 1.5x, 3x, or more ● Can load the data faster than InnoDB ● But…
  • 35.
    35 Data loading –limitations ● Limitation: Transaction must fit in memory mysql> ALTER TABLE big_table ENGINE=RocksDB; ERROR 2013 (HY000): Lost connection to MySQL server during query ● Uses a lot of memory, then killed by OOM killer. ● Need to use special settings for loading data – https://mariadb.com/kb/en/library/loading-data-into-myrocks/ – https://github.com/facebook/mysql-5.6/wiki/data-loading
  • 36.
    36 Sorted bulk loading MariaDB>set rocksdb_bulk_load=1; ... # something that inserts the data in PK order MariaDB> set rocksdb_bulk_load=0; Bypasses ● Transactional locking ● Seeing the data you are inserting ● Compactions ● ...
  • 37.
    37 Unsorted bulk loading MariaDB>set rocksdb_bulk_load_allow_unsorted=0; MariaDB> set rocksdb_bulk_load=1; ... # something that inserts data in any order MariaDB> set rocksdb_bulk_load=0; ● Same as previous but doesn’t require PK order
  • 38.
    38 Other speedups forlading ● rocksdb_commit_in_the_middle – Auto commit every rocksdb_bulk_load_size rows ● @@unique_checks – Setting to FALSE will bypass unique checks
  • 39.
    39 Creating indexes ● Indexcreation doesn’t need any special settings mysql> ALTER TABLE myrocks_table ADD INDEX(...); ● Compression is enabled by default – Lots of settings to tune it further
  • 40.
    40 max_row_locks as asafety setting ● Avoid run-away memory usage and OOM killer: MariaDB> set rocksdb_max_row_locks=10000; MariaDB> alter table t1 engine=rocksdb; ERROR 4067 (HY000): Status error 10 received from RocksDB: Operation aborted: Failed to acquire lock due to max_num_locks limit ● This setting is useful after migration, too.
  • 41.
    41 Using MyRocks ● Installation ●Migration ● Tuning ● Replication ● Backup
  • 42.
    42 Essential MyRocks tuning ●rocksdb_flush_log_at_trx_commit=1 – Same as innodb_flush_log_at_trx_commit=1 – Together with sync_binlog=1 makes the master crash- safe ● rocksdb_block_cache_size=... – Similar to innodb_buffer_pool_size – 500Mb by default (will use up to that amount * 1.5?) – Set higher if have more RAM ● Safety: rocksdb_max_row_locks=...
  • 43.
    43 Indexes ● Primary Keyis the clustered index (like in InnoDB) ● Secondary indexes include PK columns (like in InnoDB) ● Non-unique secondary indexes are cheap – Index maintenance is read-free ● Unique secondary indexes are more expensive – Maintenance is not read-free – Reads can suffer from read amplification – Can use @@unique_check=false, at your own risk
  • 44.
    44 Charsets and collations ●Index entries are compared with memcmp(), “mem-comparable” ● “Reversible collations”: binary, latin1_bin, utf8_bin – Can convert values to/from their mem-comparable form ● “Restorable collations” – 1-byte characters, one weght per character – e.g. latin1_general_ci, latin1_swedish_ci, ... – Index stores mem-comparable form + restore data ● Other collations (e.g. utf8_general_ci) – Index-only scans are not supported – Table row stores the original value ‘A’ a 1 ‘a’ a 0
  • 45.
    45 Charsets and collations(2) ● Using indexes on “Other” (collation) produces a warning MariaDB> create table t3 (... a varchar(100) collate utf8_general_ci, key(a)) engine=rocksdb; Query OK, 0 rows affected, 1 warning (0.20 sec) MariaDB> show warningsG *************************** 1. row *************************** Level: Warning Code: 1815 Message: Internal error: Indexed column test.t3.a uses a collation that does not allow index-only access in secondary key and has reduced disk space efficiency in primary key.
  • 46.
    46 Using MyRocks ● Installing ●Migration ● Tuning ● Replication ● Backup
  • 47.
    47 MyRocks only supportsRBR ● MyRocks’ highest isolation level is “snapshot isolation”, that is REPEATABLE-READ ● Statement-Based Replication – Slave runs the statements sequentially (~ serializable) – InnoDB uses “Gap Locks” to provide isolation req’d by SBR – MyRocks doesn’t support Gap Locks ● Row-Based Replication must be used – binlog_format=row
  • 48.
    48 Parallel replication ● Facebook/MySQL-5.6is based on MySQL 5.6 – Parallel replication for data in different databases ● MariaDB has more advanced parallel slave (controlled by @@slave_parallel_mode) – Conservative (group-commit based) – Optimistic (apply transactions in parallel, on conflict roll back the transaction with greater seq_no) – Aggressive
  • 49.
    49 Parallel replication (2) ●Conservative parallel replication works with MyRocks – Different execution paths for log_slave_updates=ON|OFF – ON might be faster ● Optimistic and aggressive do not provide speedups over conservative.
  • 50.
    50 mysql.gtid_slave_pos ● mysql.gtid_slave_pos stores slave’sposition ● Use a transactional storage engine for it – Database contents will match slave position after crash recovery – Makes the slave crash-safe. ● mysql.gtid_slave_pos uses a different engine? – Cross-engine transaction (slow).
  • 51.
    51 Per-engine mysql.gtid_slave_pos ● Theidea: – Have mysql.gtid_slave_pos_${engine} for each storage engine – Slave position is the biggest position in all tables. – Transaction affecting only MyRocks will only update mysql.gtid_slave_pos_rocksdb ● Configuration: --gtid-pos-auto-engines=engine1,engine2,...
  • 52.
    52 Per-engine mysql.gtid_slave_pos ● Availablein MariaDB 10.3 ● Thanks for the patch to – Kristian Nielsen (implementation) – Booking.com (request and funding) ● Don’t let your slave be 2-3x slower: – 10.3: my.cnf: gtid-pos-auto-engines=INNODB,ROCKSDB – 10.2: MariaDB> ALTER TABLE mysql.gtid_slave_pos ENGINE=RocksDB;
  • 53.
    53 Replication takeaways ● Mustuse binlog_format=row ● Parallel replication works (slave_parallel_mode=conservative) ● Set your mysql.gtid_slave_pos correctly – 10.2: preferred engine – 10.3: per-engine
  • 54.
    54 Using MyRocks ● Installation ●Migration ● Tuning ● Replication ● Backup
  • 55.
    55 myrocks_hotbackup tool ● Takesa hot (non-blocking) backup of RocksDB data mkdir /path/to/checkpoint-dir myrocks_hotbackup --user=... --port=... --checkpoint_dir=/path/to/checkpoint-dir | ssh "tar -xi ..." myrocks_hotbackup --move_back --datadir=/path/to/restore-dir --rocksdb_datadir=/path/to/restore-dir/#rocksdb --rocksdb_waldir= /path/to/restore-dir/#rocksdb --backup_dir=/where/tarball/was/unpacked ● Making a backup ● Starting from backup
  • 56.
    56 myrocks_hotbackup limitations ● TargetsMyRocks-only installs – Copies MyRocks data and supplementary files – Does *not* work for mixed MyRocks+InnoDB setups ● Assumes no DDL is running concurrently with the backup – May get a wrong frm file ● Not very user-friendly
  • 57.
    57 Further backup considerations ●mariabackup does not work with MyRocks currently ● myrocks_hotbackup is similar (actually simpler) than xtrabackup ● Will look at making mariabackup work with MyRocks
  • 58.
    58 Using MyRocks ● Installation ●Migration ● Tuning ● Replication ● Backup ✔ ✔ ✔ ✔ ✔
  • 59.
  • 60.
    60 Conclusions ● MyRocks isa write-optimized storage engine based on LSM-trees – Developed and used by Facebook ● Available in MariaDB 10.2 and MariaDB 10.3 – Currently RC, Stable very soon ● It has some limitations and requires some tuning – Easy to tackle if you were here for this talk :-)
  • 61.