Advanced MariaDB features that developers love.pdf

Advanced MariaDB features
that developers love
Federico Razzoli
vettabase.com

Advanced MariaDB features
that developers 💘 💖 💕 😍😍😍
Federico Razzoli
vettabase.com

$ whoami
Hi, I’m Federico Razzoli, founder of Vettabase
Database specialist, open source supporter,
long time MariaDB and MySQL user
Despite my accent, I live in 󰧺
● vettabase.com
● federico.razzoli@vettabase.com

This talk is…
● NOT a MariaDB tutorial
● We’ll see some cool MariaDB features arbitrarily chosen by me:
○ CONNECT storage engine
○ Temporal table
○ JSON columns
● In the process, we’ll also see quickly some other nice features

Querying remote / heterogeneous
data sources… in SQL

Why?
● Because… it’s SQL!
● Because sometimes we import data from other sources
● Because you may have to interact with systems you don’t control

CONNECT storage engine
● MariaDB doesn’t know how to read/write tables, indexes, data caches…
● These actions are delegated to plugins called storage engines
● A cool consequence is that when we run a query they could do anything they
like
○ BLACKHOLE does nothing
○ SEQUENCE returns numerical sequences
○ Run SHOW ENGINES
● CONNECT treats an external data source as if it was a local table
● So you can run your SELECTs, JOINs, INSERTs…

● Files: CSV, JSON, XML, tables in HTML pages… ﬁles in archives… custom ﬁles…
● Remote databases: Remote databases: MySQL protocol, MongoDB, ODBC, JDBC
● Data transformation
● More

Installing CONNECT
INSTALL SONAME 'ha_connect';
If you get this error:
ERROR 1126 (HY000): Can't open shared library
'/usr/lib/mysql/plugin/ha_connect.so' (errno: 2, cannot open shared
object file: No such file or directory)
…you don’t have the plugin in plugin_dir. Most probably MariaDB was installed from a
Linux repository. There should be a package for CONNECT:
● yum install MariaDB-connect-engine
● apt-get install mariadb-plugin-connect
If you installed MariaDB from tarball, you’ll need to install dependencies, use ldd.

Creating a CONNECT table (CSV)
CREATE OR REPLACE TABLE import_product (
id INTEGER UNSIGNED NOT NULL,
name VARCHAR(100) NOT NULL,
quantity INTEGER NOT NULL,
last_modified DATE NOT NULL date_format='YY/MM/DD'
)
ENGINE=CONNECT
TABLE_TYPE=CSV
FILE_NAME='/var/import/product.csv'
SEP_CHAR='t'
BLOCK_SIZE=2048
;

Copying data into a regular table
INSERT IGNORE INTO product
SELECT *
FROM import_product
WHERE last_modiﬁed > (CURRENT_DATE() - INTERVAL 7 DAY);

Inserting data into CONNECT
MariaDB [test]> INSERT INTO import_product
(id, name, quantity, last_modified)
VALUES (24, 'Sonic screwdriver', 100,
CURRENT_DATE());
Query OK, 1 row affected (0.067 sec)
MariaDB [test]> ! tail -1 /var/import/product.csv
24 Sonic screwdriver 100 22/06/20

CONNECT Indexes
● You can build indexes
● Indexes are stored in separate ﬁles
● Avoid this message by using proper character sets and specifying VARCHAR
length:
Specified key was too long; max key length is 255 bytes
● CONNECT indexes work better if rows are pre-ordered
ALTER TABLE import_product ADD INDEX idx_name (date);

CONNECT storage engine: merging tables
● We can merge multiple CONNECT tables:
CREATE OR REPLACE TABLE import_product_all
ENGINE=CONNECT
TABLE_TYPE=TBL
TABLE_LIST='import_product_1,import_product_2'
;

CONNECT storage engine: merging tables
● We can merge CONNECT and regular tables:
CREATE OR REPLACE TABLE product_proxy
ENGINE=CONNECT
TABLE_TYPE=PROXY
TABNAME=product
;
CREATE OR REPLACE TABLE import_product_all
ENGINE=CONNECT
TABLE_TYPE=TBL
TABLE_LIST='product_proxy,import_product_1,import_product_2'
;

CONNECT + SQL Server
● Column deﬁnitions can be automatically retrieved
● (though you may need to map column types & character sets)
● Remote indexes are used
CREATE TABLE import_product
ENGINE=CONNECT,
TABLE_TYPE=ODBC,
TABNAME='product'
CONNECTION='Driver=SQL Server Native Client
13.0;Server=sql-server-hostname;Database=shop;UID=mariadb_co
nnect;PWD=secret';

CONNECT data transformation
Some table types are used for data transformation:
● PIVOT contains the contents of another table or a query results as a pivot table
● XCOL “normalises” tables with a column containing a comma-separated list
● OCCUR is the opposite of XCOL
Example:
Scotland Edinburgh, Glasgow
England London, Cambridge
Scotland Edinburgh
Scotland Glasgow
England London
England Cambridge

Temporal Tables
Temporal tables contain versioned rows.
Types of Temporal Tables:
● System-versioned
○ MariaDB automatically maintains row versions, with start and end timestamps
● Application-time
○ The application can use special SQL syntax to maintain and query row versions
● Bitemporal tables
○ Both system-versioned and application-time
In all cases, regular SQL queries will work and will only return current data

Temporal Tables: System-versioned
● MariaDB automatically maintains row versions, with start and end timestamps
● Or, one can have transaction IDs instead of timestamps
● There is no way to make a change that is not versioned
● You can use them even for proprietary applications that you can’t modify
● It is even possible to make a table system-versioned in a MariaDB replica

Temporal Tables
CREATE OR REPLACE TABLE ticket (
id INT PRIMARY KEY NOT NULL AUTO_INCREMENT,
state ENUM('OPEN', 'VERIFIED', 'FIXED', 'INVALID') NOT
NULL DEFAULT 'OPEN',
summary VARCHAR(200) NOT NULL,
description TEXT NOT NULL
)
ENGINE InnoDB
;
We want to start to track changes to bugs over time.

Temporal Tables
ALTER TABLE ticket
LOCK = SHARED,
ALGORITHM = COPY,
ADD COLUMN valid_from DATETIME NOT NULL DEFAULT NOW(),
ADD COLUMN valid_to DATETIME NOT NULL DEFAULT
'2038-01-19 03:14:07.999999',
ADD INDEX idx_valid_from (valid_from),
ADD INDEX idx_valid_to (valid_to),
ADD PERIOD FOR system_period (valid_from, valid_to)
ADD SYSTEM VERSIONING;

Querying a system-versioned table
-- get current version of the rows
-- without the temporal columns (they’re INVISIBLE)
SELECT * FROM ticket;
-- get current version of the rows
-- with the temporal columns
SELECT *, inserted_at, deleted_at FROM ticket;
-- all current and old data
SELECT *, inserted_at, deleted_at
FROM ticket FOR SYSTEM_TIME ALL;

Get old versions of the rows
-- get deleted rows
FROM ticket FOR SYSTEM_TIME
FROM '1970-00-00' TO (NOW() - 1 MICROSECOND);
FROM ticket FOR SYSTEM_TIME ALL
WHERE deleted_at < NOW();

HIstory of a row
SELECT id, state, inserted_at, deleted_at
WHERE id = 3
ORDER BY deleted_at;

Read a row from a speciﬁc point in time
SELECT id, state
FROM ticket FOR SYSTEM_TIME AS OF TIMESTAMP '2020-08-22
08:52:36'
WHERE id = 3;
SELECT id, state
WHERE id = 3 AND
'2020-08-22 08:52:36' BETWEEN inserted_at AND
deleted_at;

Temporal JOINs
-- rows that were present on 07/01
-- whose state did not change after one month
SELECT t1.id, t1.inserted_at, t1.deleted_at
FROM ticket FOR SYSTEM_TIME ALL AS t1
LEFT JOIN ticket FOR SYSTEM_TIME ALL AS t2
ON t1.id = t2.id
AND t1.state = t2.state
WHERE '2020-07-01 00:00:00' BETWEEN t1.inserted_at AND t1.deleted_at
AND '2020-08-01 00:00:00' BETWEEN t2.inserted_at AND t2.deleted_at
AND t2.id IS NULL
ORDER BY t1.id;

Hints about other things you can do
● Stats on added/deleted rows by year, month, weekday, day, daytime…
● Stats on rows life length
● Get rows that never changed
● Anomaly detection: get rows that change too often, or change at weird times
● Examine history of a row to ﬁnd problems

JSON use cases
● Build a prototype
● Store data to import/export in an intermediate form
● Complex, nested data
● Store partially heterogeneous data
● Store documents with inheritance
● Store arrays / lists / sets
● Store key/value pairs (a more standard replacement for SET or ENUM types)
Some of these will be elaborated later

JSON use cases: partially heterogeneous data
Typical example: a product catalog
product: name, desc, price
shirt: name, desc, price, size, colour
phone: name, desc, price, size, brand, model, weight
Solutions:
● 1 table with a column for every property that at least 1 product has
● 1 table per product type
● 1 common table + 1 table per product type
● 1 table with common columns + 1 JSON column
○ (or more than 1)

CREATE TABLE product (
id INT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
product_type VARCHAR(50) NOT NULL
COMMENT 'Determines which attributes a product has',
description TEXT NOT NULL DEFAULT '',
cost DECIMAL(10, 2) NOT NULL
attributes JSON NOT NULL,
UNIQUE unq_name_type (name, type)
);
BUT:
SELECT DISTINCT JSON_EXTRACT(attributes, '$.colour')
FROM product
WHERE type = shirt AND JSON_EXTRACT(attributes, '$.size) = 'M';

id INT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
product_type VARCHAR(50) NOT NULL
COMMENT 'Determines which attributes a product has',
description TEXT NOT NULL DEFAULT '',
cost DECIMAL(10, 2) NOT NULL
attributes JSON NOT NULL,
colour GENERATED ALWAYS AS (JSON_EXTRACT(attributes, '$.colour')) STORED,
size GENERATED ALWAYS AS (JSON_EXTRACT(attributes, '$.size)) STORED,
UNIQUE unq_name_type (name, type),
INDEX idx_size_colour (size, colour)
);

JSON with typical DBMS features
CHECK:
...
colour GENERATED ALWAYS AS (JSON_EXTRACT(attributes, '$.colour')) STORED
CHECK (colour IN ('BLACK', 'WHITE', 'BLUE'))
)

UNIQUE indexes:
...
phone_brand_colour GENERATED ALWAYS AS (
IF(
type != 'phone',
NULL,
CONCAT_WS('.', model, JSON_EXTRACT(attributes, '$.colour'))
)) STORED
)

● With the same techniques shown earlier, the colour generated column can be
used in a foreign key
● But I don’t recommend the use of foreign keys

Thanks for attending!
federico.razzoli@vettabase.com

Advanced MariaDB features that developers love.pdf

More Related Content

Similar to Advanced MariaDB features that developers love.pdf

More from Federico Razzoli

Recently uploaded

Advanced MariaDB features that developers love.pdf