With the new features of MariaDB 10.2, migrating existing Oracle-based applications has become much easier and thus economically advantageous. We present some of our best practices and introduce the Migration Practice of MariaDB.
2. What is MariaDB ColumnStore?
High performance columnar storage engine that supports a wide variety
of analytical use cases in highly scalable distributed environments
Parallel query
processing for distributed
environments
Faster, More
Efficient Queries
Single Interface for
OLTP and analytics
Easy to Manage and Scale
Easier Enterprise
Analytics
Power of SQL and
Freedom of Open Source
to Big Data Analytics
Better Price
Performance
3. Rows/DataSize Scope
1 100 10,000 1,000,000 100,000,000 10,000,000,000 100,000,000,000
10-100GB 100-1000GB 1-10TB 10-100TB...PB
Transactional Databases MariaDB ColumnStore Engine
MariaDB ColumnStore Technical Use Cases
● Data warehousing
○ Selective column based queries
○ Large number of dimensions
● High Performance Analytics on large volume of data
○ Reporting and analysis on billions of rows
○ From datasets containing trillions of rows
○ Terabytes to Petabytes of datasets
● Analytics requiring
○ Complex Joins, Windowing Functions
4. Row-oriented vs. Column-oriented format
• Row oriented
– Rows stored sequentially in
a file
– Scans through every record
row by row
• Column oriented:
– Each column is stored in a
separate file
– Scans only the relevant
columns
ID Fname Lname State Zip Phone Age Sex
1 Bugs Bunny NY 11217 (718) 938-3235 34 M
2 Yosemite Sam CA 95389 (209) 375-6572 52 M
3 Daffy Duck NY 10013 (212) 227-1810 35 M
4 Elmer Fudd ME 04578 (207) 882-7323 43 M
5 Witch Hazel MA 01970 (978) 744-0991 57 F
ID
1
2
3
4
5
Fname
Bugs
Yosemite
Daffy
Elmer
Witch
Lname
Bunny
Sam
Duck
Fudd
Hazel
State
NY
CA
NY
ME
MA
Zip
11217
95389
10013
04578
01970
Phone
(718) 938-3235
(209) 375-6572
(212) 227-1810
(207) 882-7323
(978) 744-0991
Age
34
52
35
43
57
Sex
M
M
M
M
F
SELECT Fname FROM People WHERE State = 'NY'
5. MariaDB ColumnStore Architecture
Columnar Distributed Data Storage
Local Storage | SAN/NAS | EBS | GlusterFS | CEPH
BI Tool SQL Client Custom
Big Data App
Application
MariaDB SQL
Front End
(User Module)
Distributed
Query Engine
(Performance Module)
Data
Storage
SQL Client
6. Storage Architecture
• Columnar storage
– Each column stored as separate file
– No index management for query
performance tuning
– Online Schema changes: Add new column
without impacting running queries
• Automatic horizontal partitioning
– Logical partition every 8 Million rows
– In memory metadata of partition min and max
– No partition management for query
performance tuning
• Compression
– Accelerate decompression rate
– Reduce I/O for compressed blocks
Column 1
Extent 1 (8 million rows, 8MB~64MB)
Extent 2 (8 million rows)
Extent M (8 million rows)
Column 2 Column 3 ... Column N
Data automatically arranged by
• Column – Acts as Vertical Partitioning
• Extents – Acts as horizontal partition
Vertical
Partition
Horizontal
Partition
...
Vertical
Partition
Vertical
Partition
Vertical
Partition
Horizontal
Partition
Horizontal
Partition
7. High Performance Data Ingestion
• Fully parallel high speed data load
– cpimport utility works directly with
performance module write engines across
nodes for maximum performance.
– Tables can be loaded concurrently.
– Queries can happen concurrently with
transactionally consistent results.
• Micro-batch loading for real-time
data flow.
• DML, INSERT INTO .. SELECT &
LOAD DATA INFILE also
supported.
cpimport
Data
Feed
User
Module
(UM)
Performance
Module
(PM)
8. Shared Nothing Distributed Data Storage
SQL
Column
Primitives
User
Module
Performance
Module
UM
PM
Distributed Query Processing
• Query received and parsed by
MariaDB Front End on UM
• Storage Engine Plugin breaks down query in
primitive operations and distributes across PM
• Primitives processed on PM
• Execute column restrictions and projections
• Execute group by/aggregation against local data
• Each PM work on Primitives in parallel threads
and fully distributed
• Each primitive executes in a fraction of a second
• Return intermediate results to UM
Massively parallel, distributed query processing, Shared nothing architecture
Primitive
Operations ↓↓↓↓
Intermediate
↑↑Results↑↑
9. Horizontal
Partition:
8 Million Rows
Extent 2
Horizontal
Partition:
8 Million Rows
Extent 3
Horizontal
Partition:
8 Million Rows
Extent 1
Storage Architecture reduces I/O
• Only touch column files
that are in filter, projection,
group by, and join conditions
• Eliminate disk block touches
to partitions outside filter
and join conditions
Extent 1:
ShipDate: 2016-01-12 - 2016-03-05
Extent 2:
ShipDate: 2016-03-05 - 2016-09-23
Extent 3:
ShipDate: 2016-09-24 - 2017-01-06
SELECT Item, sum(Quantity) FROM Orders
WHERE ShipDate between ‘2016-01-01’ and ‘2016-01-31’
GROUP BY Item
High Performance Query Processing
Id OrderId Line Item Quantity Price Supplier ShipDate ShipMode
1 1 1 Laptop 5 1000 Dell 2016-01-12 G
2 1 2 Monitor 5 200 LG 2016-01-13 G
3 2 1 Mouse 1 20 Logitech 2016-02-05 M
4 3 1 Laptop 3 1600 Apple 2016-01-31 P
... ... ... ... ... ... ... ... ...
8M 2016-03-05
8M+1 2016-03-05
... ... ... ... ... ... ... ... ...
16M 2016-09-23
16M+1 2016-09-24
... ... ... ... ... ... ... ... ...
24M 2017-01-06
ELIMINATED PARTITION
ELIMINATED PARTITION
10. Analytics
• In-database distributed analytics with complex
join, aggregation, window functions
• Cross Engine Join allows for queries to be
executed referencing both columnstore and
non-columnstore tables.
• Extensible User Defined Functions allow
creation of specialized logic executed at PM
level.
• Standard MariaDB Connectors provide for out
of the box integration with:
– BI Tools (Tableau, Pentaho, ..)
– Custom Application Code (Java, Scala, C#,
Python, ..)
– Data Processing Frameworks (R, Spark,
Numpy, ..)
Item ID Server_date Revenue
1 2017-02-01 20,000.0
1 2017-02-02 5,001.00
2 2017-02-01 15,000.0
2 2017-02-04 34,029.0
2 2017-02-05 7,138.00
3 2017-02-01 17,250.0
3 2017-02-03 25,010.0
3 2017-02-04 21,034.0
3 2017-02-05 4,120.00
Running Average
20,000.00
12,500.50
15,000.00
34,029.00
20,583.50
17,250.00
25,010.00
23,022.00
12,577.00
Window Function Example: Daily Running Average Revenue by Item
SELECT item_id, server_date, daily_revenue,
AVG(revenue) OVER
(PARTITION BY item_id ORDER BY server_date
RANGE INTERVAL 1 DAY PRECEDING ) running_avg
FROM web_item_sales
BI Tool
Custom
Big Data App
Data
Processing
Framework
JDBC / ODBC / Connector
11. Enterprise Grade
• Enterprise Grade Security
– SSL, role based access, auditability.
– MaxScale database firewall
• Deployment Flexibility
– Run on commodity Linux servers on premise
or in the cloud.
– AWS optimized AMI Image.
– Add horizontal capacity as you grow.
• High Availability
– Automatic UM failover
– Automatic PM failover with distributed data
attachment across all PMs in SAN and EBS
environment
Shared-Nothing Distributed Data Storage
Compressed by default
User
Module
(UM)
Performance
Module
(PM)
Data Storage
Load
Balancer -
MaxScale
12. Internationalization
Post Install Configuration
• my.cnf:
[client]
default-character-set=utf8
[mysqld]
character-set-server=utf8
collation-server=utf8_general_ci
init-connect=’SET NAMES utf8’
• Columnstore.xml:
<SystemConfig>
<SystemLang>en_US.utf8</SystemLang>
Usage
• Create table specifying utf8:
create table airports
(name varchar(30), ..)
engine=columnstore
default character set 'utf8';
• cpimport files must be utf8 encoded.
• Multibyte character table names not yet
supported.
ColumnStore 1.0 supports UTF8 character set to allow storing Japanese text. More details:
https://mariadb.com/kb/en/mariadb/mariadb-columnstore-system-usage/
13. InfiniDB Migration
ColumnStore 1.0 remains binary compatible with InfiniDB 4.6 storage allowing migration:
● Upgrade on same servers to ColumnStore 1.0:
https://mariadb.com/kb/en/mariadb/upgrade-from-infinidb-4x-to-mariadb-columnstore-1xx/
● Migrate to new ColumnStore 1.0 servers:
https://mariadb.com/kb/en/mariadb/migrating-from-infinidb-4x-to-mariadb-columnstore/
14. Coming Soon - ColumnStore 1.1
● Text / Blob datatype support
● Bulk Write API Connector
○ Kafka integration
○ Replication integration
○ Custom
● User Defined Aggregate & Window functions.
● Data Redundancy for local storage.
● Installation improvements.
● Performance & stability improvements.
● MariaDB Server 10.2
15. MariaDB ColumnStore In Summary
Flexible deployment:
cloud or on-premise
commodity server
Open source
big data Analytics
High data
compression
In-database
distributed analytics
Cross-join with
OLTP engines
Enterprise grade security
and high availability
Easy to manage
and scale
Parallel, distributed
query processing
Columnar
optimized
High data
compression
Faster, More
Efficient Queries
Easier Enterprise
Analytics
Better Price
Performance
16. Where to find MariaDB ColumnStore?
SOFTWARE DOWNLOAD https://mariadb.com/downloads/columnstore
SOURCE https://github.com/mariadb-corporation/mariadb-columnstore-engine
DOCUMENTATION https://mariadb.com/kb/en/mariadb/mariadb-columnstore/
BLOGS https://mariadb.com/blog-tags/columnstore
</>