Transactional and Analytics together: MariaDB and ColumnStore

TRANSACTIONAL AND
ANALYTICS TOGETHER
UNDERSTANDING THE ARCHITECTURE OF
MARIADB COLUMNSTORE
Maria Luisa Raviol
Senior Sales Engineer EMEA
MariaDB Corporation

Database workloads
Current data
Range queries
Known queries
Transactional
Historical data
Aggregate queries
Unkown queries
Analytical

Analytical Transactional
Performance
Range

Performance
Range
Performance
Range
More data
More customers

AX TX
Performance
Range
Database (OLTP)
AX TX
Performance
Range
Data warehouse (OLAP)

Database workloads
Current data
Range queries
Known queries
Row-based storage
Indexes
Clustered/Replicated
Transactional
Historical data
Aggregate queries
Unkown queries
Columnar storage
No indexes
Distributed
Analytical

Existing Approaches
Limited real time analytics
Slow releases of product innovation
Expensive hardware and software
Data Warehouses
Hadoop / NoSQL
LIMITED SQL
SUPPORT
DIFFICULT TO
INSTALL/MANAGE
LIMITED TALENT POOL
DATA LAKE W/ NO DATA
MANAGEMENT
Hard to use

AX TX
Performance
Range
Database (OLTP)
AX TX
Performance
Range
Data warehouse (OLAP)
Application development BI/reporting + data science

Application
(eCommerce)
Transactional
Show me all new products in
the science fiction category
Analytical
Show me the top products
added to shopping carts or
purchased today, and with
low inventory.
Actionable insight
I should buy one now
because everyone wants
one, and they’ll be sold out
by the end of the day!

Data warehouse
(OLAP)
Database
(OLTP)
Hybrid workloads: the problem
Transactions
App/dev
Analytics
BI/reporting + data science

Data warehouse
(OLAP)
Database
(Hybrid)
Hybrid workloads: the solution
Transactions Analytics
App/dev
Analytics
BI/reporting + data science

MariaDB TX 3.0
MariaDB Server 10.3
MariaDB MaxScale 2.2
InnoDB/MyRocks
MariaDB AX 2.0
MariaDB Server 10.2
ColumnStore 1.2
MariaDB Platform X3
MariaDB Server 10.3
InnoDB/MyRocks
MariaDB Server 10.3
ColumnStore 1.3

The database proxy inspects queries and routes them to transactional
and/or analytical database instances.
MariaDB Platform X3
CDC
MariaDB Server 10.3
InnoDB/MyRocks
MariaDB Server 10.3
ColumnStore 1.3
Transactional Analytical

The database proxy inspects queries and routes them to transactional
and/or analytical database instances.
The change-data-capture stream replicates all writes from transactional
databases to analytical databases within microbatches.
MariaDB Platform X3
CDC
MariaDB Server 10.3
InnoDB/MyRocks
MariaDB Server 10.3
ColumnStore 1.3

Containers
MariaDB Platform X3
CDC
MariaDB Server 10.3
InnoDB/MyRocks
MariaDB Server 10.3
ColumnStore 1.3
Kubernetes (Helm) Docker (Compose)

Import bulk data
Containers
MariaDB Platform X3
CDC
MariaDB Server 10.3
InnoDB/MyRocks
MariaDB Server 10.3
ColumnStore 1.3
Spark connector
C/Java/Python API
Ingest streaming data
Kafka connector

Applications
Containers
MariaDB Platform X3
CDC
MariaDB Server 10.3
InnoDB/MyRocks
MariaDB Server 10.3
ColumnStore 1.3
C JDBC ODBC Node.js
Kafka connector
Import bulk data
Spark connector
C/Java/Python API

Applications
Containers
MariaDB Platform X3
CDC
MariaDB Server 10.3
InnoDB/MyRocks
MariaDB Server 10.3
ColumnStore 1.3
C JDBC ODBC Node.js
Kafka connector
Administration
SQL Diagnostic
Manager
SQLyog
MariaDB Backup
MariaDB Flashback
Import bulk data
Spark connector
C/Java/Python API

Row-oriented vs. Column-oriented format
● Row oriented
○Rows stored sequentially in a
file
○Scans through every record
row by row
● Column oriented:
○Each column is stored in a
separate file
○Scans only the relevant
columns
ID Fname Lname State Zip Phone Age Sex
1 Bugs Bunny NY 11217 (718) 938-3235 34 M
2 Yosemite Sam CA 95389 (209) 375-6572 52 M
3 Daffy Duck NY 10013 (212) 227-1810 35 M
4 Elmer Fudd ME 04578 (207) 882-7323 43 M
5 Witch Hazel MA 01970 (978) 744-0991 57 F
ID
1
2
3
4
5
Fname
Bugs
Yosemite
Daffy
Elmer
Witch
Lname
Bunny
Sam
Duck
Fudd
Hazel
State
NY
CA
NY
ME
MA
Zip
11217
95389
10013
04578
01970
Phone
(718) 938-3235
(209) 375-6572
(212) 227-1810
(207) 882-7323
(978) 744-0991
Age
34
52
35
43
57
Sex
M
M
M
M
F
SELECT Fname FROM People WHERE State = 'NY'

Single-Row Operations - Insert
Row oriented:
new rows appended to
the end.
Column oriented:
new value added to
each file.
Key Fname Lname State Zip Phone Age Sex
1 Bugs Bunny NY 11217 (718) 938-3235 34 M
2 Yosemite Sam CA 95389 (209) 375-6572 52 M
3 Daffy Duck NY 10013 (212) 227-1810 35 M
4 Elmer Fudd ME 04578 (207) 882-7323 43 M
5 Witch Hazel MA 01970 (978) 744-0991 57 F
6 Marvin Martian CA 91602 (818) 761-9964 26 M
Key
1
2
3
4
5
Fname
Bugs
Yosemite
Daffy
Elmer
Witch
Lname
Bunny
Sam
Duck
Fudd
Hazel
State
NY
CA
NY
ME
MA
Zip
11217
95389
10013
04578
01970
Phone
(718) 938-3235
(209) 375-6572
(212) 227-1810
(207) 882-7323
(978) 744-0991
Age
34
52
35
43
57
Sex
M
M
M
M
F
Columnar insert not efficient for singleton insertions (OLTP). Batch loads touches row vs.
column. Batch load on column-oriented is faster (compression, no indexes).

Single-Row Operations - Update
Row oriented:
Update 100% of rows
means change 100%
of blocks on disk.
Column oriented:
Just update the blocks
needed to be updated
1 Bugs Bunny NY 11217 (718) 938-3235 34 M
2 Yosemite Sam CA 95389 (209) 375-6572 52 M
3 Daffy Duck NY 10013 (212) 227-1810 35 M
4 Elmer Fudd ME 04578 (207) 882-7323 43 M
5 Witch Hazel MA 01970 (978) 744-0991 57 F
Key
1
2
3
4
5
Fname
Bugs
Yosemite
Daffy
Elmer
Witch
Lname
Bunny
Sam
Duck
Fudd
Hazel
State
NY
CA
NY
ME
MA
Zip
11217
95389
10013
04578
01970
Phone
(718) 938-3235
(209) 375-6572
(212) 227-1810
(207) 882-7323
(978) 744-0991
Age
34
52
35
43
57
Sex
M
M
M
M
F

Single-Row Operations - Delete
Row oriented:
new rows deleted
Column oriented:
value deleted from
each file
1 Bugs Bunny NY 11217 (718) 938-3235 34 M
2 Yosemite Sam CA 95389 (209) 375-6572 52 M
3 Daffy Duck NY 10013 (212) 227-1810 35 M
4 Elmer Fudd ME 04578 (207) 882-7323 43 M
5 Witch Hazel MA 01970 (978) 744-0991 57 F
Key
1
2
3
4
5
Fname
Bugs
Yosemite
Daffy
Elmer
Witch
Lname
Bunny
Sam
Duck
Fudd
Hazel
State
NY
CA
NY
ME
MA
Zip
11217
95389
10013
04578
01970
Phone
(718) 938-3235
(209) 375-6572
(212) 227-1810
(207) 882-7323
(978) 744-0991
Age
34
52
35
43
57
Sex
M
M
M
M
F

Changing the table structure
Row oriented:
requires rebuilding of
the whole table
Column oriented:
Create new file for the
new column
Column-oriented is very flexible for adding columns, no need for a full rebuild
required with it.
Key Fname Lname State Zip Phone Age Sex Active
1 Bugs Bunny NY 11217 (718) 938-3235 34 M Y
2 Yosemite Sam CA 95389 (209) 375-6572 52 M N
3 Daffy Duck NY 10013 (212) 227-1810 35 M N
4 Elmer Fudd ME 04578 (207) 882-7323 43 M Y
5 Witch Hazel MA 01970 (978) 744-0991 57 F N
Key
1
2
3
4
5
Fname
Bugs
Yosemite
Daffy
Elmer
Witch
Lname
Bunny
Sam
Duck
Fudd
Hazel
State
NY
CA
NY
ME
MA
Zip
11217
95389
10013
04578
01970
Phone
(718) 938-3235
(209) 375-6572
(212) 227-1810
(207) 882-7323
(978) 744-0991
Age
34
52
35
43
57
Sex
M
M
M
M
F
Active
Y
N
N
Y
N

Easier Enterprise
Analytics
Single SQL Front-end
• Use a single SQL interface for analytics and OLTP
• Leverage MariaDB Security features - Encryption for data in motion, role based
access and auditing
Full ANSI SQL
• No more SQL “like” query
• Support complex join, aggregation and window function
Easy to manage and scale
• Eliminate needs for indexes and views
• Automated horizontal/vertical partitioning
• Linear scalable by adding new nodes as data grows
• Out of box connection with BI tools
• 90.3% cost reduction per TB per year
ANSI SQL

Faster, More
Efficient Queries
Parallel
Query Processing
Optimized for Columnar storage
• Columnar storage reduces disk I/O
• Blazing fast read-intensive workload
• Ultra fast data import
Parallel distributed query execution
• Distributed queries into series of parallel operations
• Fully parallel high speed data ingestion
• TPCH lineitem table - 750K to 1 million rows per second
Highly available analytic environment
• Built-in Redundancy
• Automatic fail-over
MariaDB AX customers across industries: Auto Parts, Finance, Ad analytics, Asset
management, Telecommunication, Healthcare, Digital Media, Carpooling App

MariaDB Analitycs
MariaDB MaxScale MariaDB MaxScale
MariaDB Server
ColumnStore
MariaDB Server
ColumnStore
MariaDB Server
ColumnStore
UM
User
Module
ColumnStore
Storage
ColumnStore
Storage
ColumnStore
Storage
ColumnStore
Storage
PM
(Performance Module)
• Clients connect to a User Module
• The User Module optimizes and
controls the execution
• Data is distributed among the
Performance Modules
• Data is stored, processed and
managed by Performance
Modules
• Performance Modules process
query primitives in parallel
• The User Module combines the
results from the Performance
Modules
Clients

Storage Architecture
Data is stored column by column
Each column is stored in one or more extents
Each extent is represented by 1 file
Each extent is arranged in fixed size blocks
Extents are compressed (using Snappy)
Data is one of
Fixed size (1, 2, 4 or 8 bytes)
Dictionary based with a fixed size pointer
Meta data is in an extent map
Extent map is in memory
Extent map contains meta data on each
extent, like min and max
Column 1
Extent 1 (8 million rows, 8MB～64MB)
Extent 2 (8 million rows)
Extent M (8 million rows)
Column 2 Column 3 ... Column N
Data automatically arranged by
• Column – Acts as Vertical Partitioning
• Extents – Acts as horizontal partition
Vertical
Partition
Horizontal
Partition
...
Vertical
Partition
Vertical
Partition
Vertical
Partition
Horizontal
Partition
Horizontal
Partition

High Performance Query Processing
Horizontal
Partition:
8 Million Rows
Extent 2
Horizontal
Partition:
8 Million Rows
Extent 3
Horizontal
Partition:
8 Million Rows
Extent 1
Storage Architecture reduces I/O
• Only touch column files
that are in filter, projection,
group by, and join conditions
• Eliminate disk block touches
to partitions outside filter
and join conditions
Extent 1:
ShipDate: 2016-01-12 - 2016-03-05
Extent 2:
ShipDate: 2016-03-05 - 2016-09-23
Extent 3:
ShipDate: 2016-09-24 - 2017-01-06
SELECT Item, sum(Quantity) FROM Orders
WHERE ShipDate between ‘2016-01-01’ and ‘2016-01-31’
GROUP BY Item
Id OrderId Line Item Quantity Price Supplier ShipDate ShipMode
1 1 1 Laptop 5 1000 Dell
2016-01-
12 G
2 1 2 Monitor 5 200 LG
2016-01-
13 G
3 2 1 Mouse 1 20 Logitech 2016-02-05 M
4 3 1 Laptop 3 1600 Apple 2016-01-31 P
... ... ... ... ... ... ... ... ...
8M 2016-03-05
8M+1 2016-03-05
... ... ... ... ... ... ... ... ...
16M 2016-09-23
16M+1 2016-09-24
... ... ... ... ... ... ... ... ...
24M 2017-01-06
ELIMINATED PARTITION
ELIMINATED PARTITION

Operations
Transaction (OLTP)
Ingestion Analytics
Apache Kafka
Streaming Data Adapters
Spark / Python / ML
Bulk Data Adapters
MariaDB Server
InnoDB
MariaDB MaxScale
Web/Mobile App
MariaDB MaxScale
Analytics (OLAP)
Simple & Streamlined data ingestion
MariaDB Server
ColumnStore
Data Services
Bulk Data Adapters

Hybrid workloads: why scalability is needed
Applications have transactional
and analytical queries
1. Constrained by limited,
lightweight analytics
2. Need full analytics to
create competitive features
Outgrowing OLTP
Applications with lots of
customers, lots of transactions
1. Limited to current or recent
transaction data (months)
2. Need access to all
historical data (years)
Using historical data
SaaS customers are becoming
data-driven organizations
1. They don’t have access to
their own data
2. They need to analyze it in
unknown/unexpected ways
Exposing analytics

Application
Connection 1
MariaDB
Server
Row storage
MariaDB Server
Columnar
storage
MariaDB
MaxScale

MariaDB Server
(Spider)
Application
Connection 1
MariaDB
Server 1
Row storage
MariaDB
Server 2
Row storage
MariaDB
Server n
Row storage
MariaDB Server
(ColumnStore)
Sharding
MariaDB
MaxScale
MariaDB Server
Columnar
storage

MariaDB Server
(Spider)
Application
Connection 1
MariaDB Server
(ColumnStore)
Node 1
Columnar
storage
Node 2
Columnar
storage
Node n
Columnar
storage
Distributed storage
MariaDB
MaxScale
MariaDB
Server
Row storage

MariaDB Server
(Spider)
Application
Connection 1
MariaDB
Server 1
Row storage
MariaDB
Server 2
Row storage
MariaDB
Server n
Row storage
MariaDB Server
(ColumnStore)
CS node 1
Columnar
storage
CS node 2
Columnar
storage
CS node n
Columnar
storage
Sharding Distributed storage
MariaDB
MaxScale

Transactional and Analytics together: MariaDB and ColumnStore

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Transactional and Analytics together: MariaDB and ColumnStore

Similar to Transactional and Analytics together: MariaDB and ColumnStore (20)

Recently uploaded

Recently uploaded (20)

Transactional and Analytics together: MariaDB and ColumnStore