Bill Karwin's presentation at the Percona Live Open Source Database Conference 2017 focuses on strategies for efficiently loading data into MySQL databases. He discusses various techniques including schema modifications, query optimizations, and configuration tweaks to enhance performance, illustrating with performance metrics from different insertion methods. The talk concludes with insights on parallel execution and the importance of benchmarking to find the best approach for specific use cases.
Bill Karwin
Software developer,consultant, trainer
Using MySQL since 2000
Senior Database Architect at SchoolMessenger
SQL Antipatterns: Avoiding the Pitfalls of Database
Programming
https://pragprog.com/titles/bksqla/sql-antipatterns
Oracle ACE Director
3.
Load Data Fast!
Commonchores
§ Dump and restore
§ Import third-party data
§ Extract, Transfer, Load (ETL)
§ Test data that needs to be reloaded
repeatedly
https://commons.wikimedia.org/wiki/File:Kitten_with_laptop_-_278017185.jpg
Is it done yet?
4.
How to SpeedThis Up?
1. Query Solutions
2. Schema Solutions
3. Configuration Solutions
4. Parallel Execution Solutions
5.
Example Table
CREATE TABLETestTable (
id INT UNSIGNED NOT NULL PRIMARY KEY,
intCol INT UNSIGNED DEFAULT NULL,
stringCol VARCHAR(100) DEFAULT NULL,
textCol TEXT
) ENGINE=InnoDB;
Let’s load 1 million rows!
6.
Best Case Performance
Runninga test script to loop over 1 million rows, without inserting to a database.
$ php test-bulk-insert.php --total-rows 1000000 --noop
This should have a speed that is the upper bound for any subsequent test.
Time: 2 seconds (00:00:02)
1000000 rows = 432435.24 rows/sec
1000000 stmt = 432435.24 stmt/sec
1000000 txns = 432435.24 txns/sec
1000000 conn = 432435.24 conn/sec
7.
Worst Case Performance
INSERTINTO TestTable (id, intCol, stringCol, textCol) VALUES
(?, ?, ?, ?);
Run a test script that executes one INSERT, commits, reconnects.
$ php test-bulk-insert.php --total-rows 10000
Time: 34 seconds (00:00:34)
10000 rows = 290.29 rows/sec
10000 stmt = 290.29 stmt/sec
10000 txns = 290.29 txns/sec
10000 conn = 290.29 conn/sec
Transactions
BEGIN TRANSACTION;
INSERT INTOTestTable …
INSERT INTO TestTable …
INSERT INTO TestTable …
INSERT INTO TestTable …
INSERT INTO TestTable …
INSERT INTO TestTable …
COMMIT;
Q: How many statements can you do in one transaction?
A: In theory this is constrained by undo log segments, but it's a lot.
Inserting with PreparedQueries
BEGIN TRANSACTION;
PREPARE INSERT INTO TestTable …
EXECUTE …
EXECUTE …
EXECUTE …
EXECUTE …
COMMIT;
Q: How many times can you execute a given prepared statement?
A: There is no limit, as far as I can tell.
Load Data inFile: Results
mysql> LOAD DATA LOCAL INFILE 'TestTable.csv'
INTO TABLE TestTable;
https://dev.mysql.com/doc/refman/8.0/en/load-data.html
Flat-file data load in a single transaction.
Works with replication.
What about LoadJSON in File?
Sorry, the hypothetical LOAD JSON INFILE is not supported by MySQL yet.
😭
But it has been proposed as a feature request:
https://bugs.mysql.com/bug.php?id=79209
Go vote for it!
Or better yet, implement it and contribute a patch!
Indexes
How much overheadfor one index? Two indexes?
1. mysql> ALTER TABLE TestTable ADD INDEX (intCol);
2. mysql> ALTER TABLE TextTable ADD INDEX (stringCol);
Index Deferral
What ifwe insert with no indexes, and build indexes at the end?
§ Thi is what Percona’s mysqldump --innodb-optimize-keys does.
§ Load time is like when you have no indexes:
Then create indexes after data load. This reduces the effective rate of rows/second:
mysql> ALTER TABLE TestTable ADD INDEX (intCol);
Query OK, 0 rows affected (7.02 sec)
mysql> ALTER TABLE TestTable ADD INDEX (stringCol);
Query OK, 0 rows affected (8.54 sec)
Time: 63 seconds (00:01:03)
1000000 rows = 15744.53 rows/sec
Time: 63 + 7 + 8.5 seconds (00:01:35)
1000000 rows = 12738.85 rows/sec
effective data
load rate
30.
Triggers
How much overheadfor a trigger?
mysql> CREATE TRIGGER TestTrigger
BEFORE INSERT ON TestTable
FOR EACH ROW
SET NEW.stringCol = UPPER(NEW.stringCol);
This is a very simple trigger. If you have more complex code, like subordinate
INSERT statements, the cost will be higher.
Parallel Import
Like LOADDATA INFILE but supports multi-threaded import:
$ mysqlimport --local --use-threads 4
dbname table1 table2 table3 table4
Runs a fixed number of threads, imports one table per thread.
If an import finishes and there are more tables, first available thread does it.
https://dev.mysql.com/doc/refman/8.0/en/mysqlimport.html
47.
Parallel Import
Connecting tolocalhost
Connecting to localhost
Connecting to localhost
Connecting to localhost
Selecting database test
Selecting database test
Selecting database test
Selecting database test
Loading data from LOCAL file: TestTable2.csv into TestTable2
Loading data from LOCAL file: TestTable3.csv into TestTable3
Loading data from LOCAL file: TestTable1.csv into TestTable1
Loading data from LOCAL file: TestTable4.csv into TestTable4
test.TestTable3: Records: 250000 Deleted: 0 Skipped: 0 Warnings: 0
Disconnecting from localhost
test.TestTable1: Records: 250000 Deleted: 0 Skipped: 0 Warnings: 0
Disconnecting from localhost
test.TestTable2: Records: 250000 Deleted: 0 Skipped: 0 Warnings: 0
Disconnecting from localhost
test.TestTable4: Records: 250000 Deleted: 0 Skipped: 0 Warnings: 0
Disconnecting from localhost
Want to TryThe Tests Yourself?
The test-bulk-insert.php script is available here:
https://github.com/billkarwin/bk-tools
52.
One Last Thing…
WhatWas Our Solution?
We cheated:
§ Load database once.
§ Take a filesystem snapshot.
§ Run tests.
§ Restore from snapshot.
§ Re-run tests.
§ etc.
This is not a good solution for everyone. It worked for one specific use case.
53.
License and Copyright
Copyright2017 Bill Karwin
http://www.slideshare.net/billkarwin
Released under a Creative Commons 3.0 License:
http://creativecommons.org/licenses/by-nc-nd/3.0/
You are free to share—to copy, distribute,
and transmit this work, under the following conditions:
Attribution.
You must attribute this
work to Bill Karwin.
Noncommercial.
You may not use this work
for commercial purposes.
No Derivative Works.
You may not alter,
transform, or build upon
this work.