What’s new in MariaDB ColumnStore

What’s new in
MariaDB ColumnStore
Andrew Hutchings
Technical Lead, MariaDB ColumnStore
MariaDB Corporation
Shane K Johnson
Senior Director of Product Marketing
MariaDB Corporation

Agenda
1. Quick overview of MariaDB ColumnStore
2. The evolution of MariaDB ColumnStore
3. Recap of key MariaDB ColumnStore 1.1 features
4. What’s new in MariaDB ColumnStore 1.2

Server 2
MariaDB ColumnStore – overview (1/2)
MariaDB Server
ColumnStore
(interface)
InnoDB
ColumnStore
(storage)
User
Module (UM)
Performance
Module (PM)
Disk
Disk
Server 1

MariaDB MaxScale
MariaDB ColumnStore – overview (2/2)
MariaDB Server
ColumnStore
(interface)
InnoDB
ColumnStore
(storage)
ColumnStore
(storage)
ColumnStore
(storage)
MariaDB Server
ColumnStore
(interface)
InnoDB
ColumnStore
(storage)
ColumnStore
(storage)
MariaDB MaxScale

The evolution of
MariaDB ColumnStore

MariaDB Server
ColumnStore
ColumnStore
(storage)
MariaDB MaxScale
Applications
Import
(cpimport)
MariaDB ColumnStore 1.0
(Jan 2016)

MariaDB Server
ColumnStore
User-defined
Window functions
ColumnStore
(storage)
User-defined aggregate functions
(distributed, single parameter)
Backup
(parallel)
GlusterFS
(HA)
Bulk data adapterMariaDB MaxScale
Applications
Spark
Connector
Kafka
Connector
CDC
Connector
C++, Java and Python
APIs
Import
(cpimport)
(Dec 2017)
Write engine

MariaDB Server
ColumnStore
User-defined
Window functions
ColumnStore
(storage)
User-defined aggregate functions
(distributed, multi parameter)
Backup
(parallel)
GlusterFS
(HA)
Bulk data adapterMariaDB MaxScale
Applications
Spark
Connector
Kafka
Connector
CDC
Connector
Pentaho
Adapter
Regression functions
Import
(cpimport)
Remote import
(mcsimport)
C++, Java and Python
APIs
(Dec 2018)
Write engine

Recap of ColumnStore 1.1 key features
1. Bulk data adapters
2. CDC streaming data adapter
3. User-defined aggregate functions (distributed)

MariaDB Server
ColumnStore
(interface)
MariaDB Server
ColumnStore
(interface)
ColumnStore
(storage)
Write engine
ColumnStore
(storage)
Write engine
Application/Service/Script
(back end)
Bulk data adapter
1. For each row
a. For each column
bulkInsert->setColumn
b. bulkInsert->writeRow
2. bulkInsert->commit
* Buffer 100,000 rows by default
ColumnStore
(storage)
Write engine
MariaDB
MaxScale
Application
(front end)
Bulk data adapters

MariaDB Server
(primary)
InnoDB/MyRocks
MariaDB MaxScale
Binlog
Binlog router
Binlog server
MariaDB Server
(secondary)
InnoDB/MyRocks
MariaDB Server
(secondary)
InnoDB/MyRocks

App/Service/Script
(backend)
Bulk data adapter
MariaDB Server
(primary)
InnoDB/MyRocks
MariaDB MaxScale
Binlog
Binlog router
Avro router
CDC protocol
CDC client
ColumnStore
(storage)
Write engine
MariaDB Server
ColumnStore
(interface)
Binlog server

MariaDB Server
ColumnStore
(WSUM=957)
ColumnStore
(WSUM=405)
Cost WSUM
10 5
100 100
200 300
ColumnStore
(WSUM=26)
Cost WSUM
4 2
8 4
20 20
ColumnStore
(WSUM=516)
Cost WSUM
12 6
60 60
300 450
User-defined aggregate
functions (distributed)
Example: calculate a weighted sum
1-10 (0.5)
11-100 (1.0)
100 (1.5)

What’s new in

Pentaho Data Integration adapter
● This adapter implements the Pentaho Data Integration / Kettle SDK to enable
rapid data loading into ColumnStore by leveraging the bulk load API
● This will provide orders of magnitude performance improvement over the DML
based adapter
● Supported on Windows 10, Ubuntu 16, and RHEL / CentOS 7
● For more details:
https://mariadb.com/kb/en/library/columnstore-streaming-data-adapters/#colum
nstore-pentaho-data-integration-data-adapter

Pentaho Data Integration adapter – usage
● As a consumer of the ColumnStore
Bulk API, copy of the cluster
ColumnStore.xml is required
● In addition, a JDBC connection is
required for metadata and to
support update / delete DML
● A target table must be defined as
the target for the data stream

Pentaho Data Integration adapter – usage
● After the target table is defined the
mapping from the input stream to
the target table must be defined
● The map all inputs button will
attempt to auto map the columns if
possible

Remote import: mcsimport
● Batch
● CSV
● Command line
● Can run outside a UM/PM
● Local source files
● Auto committed
PM 1
Write engine
Files
PM 2
Write engine
PM n
Write engine
Files Files
Server
mcsimport
MariaDB
Server (UM)
CSV

Windows support for adapters and tools
● Support is now provided for the bulk data adapter, mcsimport and Pentaho
Data Integration adapter on Microsoft Windows 10
● This opens up a broader range of integration opportunities (ETL and custom
data loading) on the desktop
● A windows specific installer is provided which installs the necessary
dependencies
● Running ColumnStore itself within Windows is still best achieved through using
the Windows Linux Subsystem or the Docker container with Docker for
Windows

Multi-parameter Distributed UDAF
● Distributed user-defined aggregate functions (UDAF) can now take more than
one parameter – both aggregate and window functions are supported
● Enables more complex functions to be distributed to PMs:
○ Multi-column functions (e.g., linear regression)
○ Implemented using this framework - details on the next slide
○ Single-column functions with an extra parameter (e.g., custom percentile)
● Requires the C++ SDK and including the compiled library on each node
● For more details see:
https://mariadb.com/kb/en/library/columnstore-user-defined-aggregate-and-win
dow-functions/

Regression functions (1/2)
● REGR_AVGX(ColumnY, ColumnX)
○ Average of the independent variable (sum(ColumnX)/N), where N is number of rows
processed by the query
● REGR_AVGY(ColumnY, ColumnX)
○ Average of the dependent variable (sum(ColumnY)/N), where N is number of rows
processed by the query
● REGR_COUNT(ColumnY, ColumnX)
○ The total number of input rows in which both column Y and column X are nonnull
● REGR_INTERCEPT(ColumnY, ColumnX)
○ The y-intercept of the least-squares-fit linear equation determined by the pairs

Regression functions (2/2)
● REGR_R2(ColumnY, ColumnX)
○ Square of the correlation coefficient: regr_intercept(ColumnY, ColumnX)
● REGR_SLOPE(ColumnY, ColumnX)
○ The slope of the least-squares-fit linear equation determined by the pairs
● REGR_SXX(ColumnY, ColumnX)
○ REGR_COUNT(y, x) * VAR_POP(x) for non-null pairs
● REGR_SXY(ColumnY, ColumnX)
○ REGR_COUNT(y, x) * COVAR_POP(y, x) for non-null pairs
● REGR_SYY(ColumnY, ColumnX)
○ REGR_COUNT(y, x) * VAR_POP(y) for non-null pairs

Data types
● An explicit TIME datatype is now supported for capturing the time of day
○ This is very useful for financial applications
○ Avoids use of a custom numeric type as a workaround
○ Uses 8 bytes of storage
○ Supported range is '-838:59:59.999999' to '838:59:59.999999'
● Additionally, precision up to milli/micro second for DATETIME and TIME data
types allow more fine-grained time specification
● Boolean data type is supported.

Additional features
● CREATE TABLE .. LIKE ..;
● GROUP BY is pushed down in vtable_mode 0 (executed by MariaDB Server)
● Reserved words and non-alphanumeric characters for table/column names
● Cross-engine joins with SSL connections
● Improvements to non-root install to not require sudo privileges for install user
○ Recommend to use the 'mysql' user
● 80 bug fixes
● 40+ bug fixes coming in the soon-to-be-released 1.2.3 maintenance release

Convergence
● Internal refactoring and preparation to remove to get off a MariaDB Server fork
● MariaDB Server 10.4 will include additional optimizer and storage engine API
enhancements so we can complete the process
● Goal is to install ColumnStore on top of a standard MariaDB server installation
● postConfigure will still be required to configure the ColumnStore cluster

What’s new in MariaDB ColumnStore

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to What’s new in MariaDB ColumnStore

Similar to What’s new in MariaDB ColumnStore (20)

More from MariaDB plc

More from MariaDB plc (20)

Recently uploaded

Recently uploaded (20)

What’s new in MariaDB ColumnStore