MariaDB ColumnStore extends MariaDB Server, a relational database for transaction processing, with distributed columnar storage and parallel query processing for scalable, high-performance analytical processing. This session helps to understand how MariaDB ColumnStore works and why it’s needed for more demanding analytical workloads.
7. Database workloads
Current data
Range queries
Known queries
Row-based storage
Indexes
Clustered/Replicated
Transactional
Historical data
Aggregate queries
Unkown queries
Columnar storage
No indexes
Distributed
Analytical
8. Existing Approaches
Limited real time analytics
Slow releases of product innovation
Expensive hardware and software
Data Warehouses
Hadoop / NoSQL
LIMITED SQL
SUPPORT
DIFFICULT TO
INSTALL/MANAGE
LIMITED TALENT POOL
DATA LAKE W/ NO DATA
MANAGEMENT
Hard to use
10. Application
(eCommerce)
Transactional
Show me all new products in
the science fiction category
Analytical
Show me the top products
added to shopping carts or
purchased today, and with
low inventory.
Actionable insight
I should buy one now
because everyone wants
one, and they’ll be sold out
by the end of the day!
14. MariaDB TX 3.0
MariaDB Server 10.3
MariaDB MaxScale 2.2
InnoDB/MyRocks
MariaDB AX 2.0
MariaDB Server 10.2
MariaDB MaxScale 2.2
ColumnStore 1.2
MariaDB Platform X3
MariaDB MaxScale 2.3
MariaDB Server 10.3
InnoDB/MyRocks
MariaDB Server 10.3
ColumnStore 1.3
15. The database proxy inspects queries and routes them to transactional
and/or analytical database instances.
MariaDB Platform X3
MariaDB MaxScale 2.3
CDC
MariaDB Server 10.3
InnoDB/MyRocks
MariaDB Server 10.3
ColumnStore 1.3
Transactional Analytical
16. The database proxy inspects queries and routes them to transactional
and/or analytical database instances.
The change-data-capture stream replicates all writes from transactional
databases to analytical databases within microbatches.
MariaDB Platform X3
MariaDB MaxScale 2.3
CDC
MariaDB Server 10.3
InnoDB/MyRocks
MariaDB Server 10.3
ColumnStore 1.3
Transactional Analytical
22. Row-oriented vs. Column-oriented format
● Row oriented
○Rows stored sequentially in a
file
○Scans through every record
row by row
● Column oriented:
○Each column is stored in a
separate file
○Scans only the relevant
columns
ID Fname Lname State Zip Phone Age Sex
1 Bugs Bunny NY 11217 (718) 938-3235 34 M
2 Yosemite Sam CA 95389 (209) 375-6572 52 M
3 Daffy Duck NY 10013 (212) 227-1810 35 M
4 Elmer Fudd ME 04578 (207) 882-7323 43 M
5 Witch Hazel MA 01970 (978) 744-0991 57 F
ID
1
2
3
4
5
Fname
Bugs
Yosemite
Daffy
Elmer
Witch
Lname
Bunny
Sam
Duck
Fudd
Hazel
State
NY
CA
NY
ME
MA
Zip
11217
95389
10013
04578
01970
Phone
(718) 938-3235
(209) 375-6572
(212) 227-1810
(207) 882-7323
(978) 744-0991
Age
34
52
35
43
57
Sex
M
M
M
M
F
SELECT Fname FROM People WHERE State = 'NY'
23. Single-Row Operations - Insert
Row oriented:
new rows appended to
the end.
Column oriented:
new value added to
each file.
Key Fname Lname State Zip Phone Age Sex
1 Bugs Bunny NY 11217 (718) 938-3235 34 M
2 Yosemite Sam CA 95389 (209) 375-6572 52 M
3 Daffy Duck NY 10013 (212) 227-1810 35 M
4 Elmer Fudd ME 04578 (207) 882-7323 43 M
5 Witch Hazel MA 01970 (978) 744-0991 57 F
6 Marvin Martian CA 91602 (818) 761-9964 26 M
Key
1
2
3
4
5
Fname
Bugs
Yosemite
Daffy
Elmer
Witch
Lname
Bunny
Sam
Duck
Fudd
Hazel
State
NY
CA
NY
ME
MA
Zip
11217
95389
10013
04578
01970
Phone
(718) 938-3235
(209) 375-6572
(212) 227-1810
(207) 882-7323
(978) 744-0991
Age
34
52
35
43
57
Sex
M
M
M
M
F
6 Marvin Martian CA 91602 (818) 761-9964 26 M
Columnar insert not efficient for singleton insertions (OLTP). Batch loads touches row vs.
column. Batch load on column-oriented is faster (compression, no indexes).
24. Single-Row Operations - Update
Row oriented:
Update 100% of rows
means change 100%
of blocks on disk.
Column oriented:
Just update the blocks
needed to be updated
Key Fname Lname State Zip Phone Age Sex
1 Bugs Bunny NY 11217 (718) 938-3235 34 M
2 Yosemite Sam CA 95389 (209) 375-6572 52 M
3 Daffy Duck NY 10013 (212) 227-1810 35 M
4 Elmer Fudd ME 04578 (207) 882-7323 43 M
5 Witch Hazel MA 01970 (978) 744-0991 57 F
Key
1
2
3
4
5
Fname
Bugs
Yosemite
Daffy
Elmer
Witch
Lname
Bunny
Sam
Duck
Fudd
Hazel
State
NY
CA
NY
ME
MA
Zip
11217
95389
10013
04578
01970
Phone
(718) 938-3235
(209) 375-6572
(212) 227-1810
(207) 882-7323
(978) 744-0991
Age
34
52
35
43
57
Sex
M
M
M
M
F
25. Single-Row Operations - Delete
Row oriented:
new rows deleted
Column oriented:
value deleted from
each file
Key Fname Lname State Zip Phone Age Sex
1 Bugs Bunny NY 11217 (718) 938-3235 34 M
2 Yosemite Sam CA 95389 (209) 375-6572 52 M
3 Daffy Duck NY 10013 (212) 227-1810 35 M
4 Elmer Fudd ME 04578 (207) 882-7323 43 M
5 Witch Hazel MA 01970 (978) 744-0991 57 F
6 Marvin Martian CA 91602 (818) 761-9964 26 M
Key
1
2
3
4
5
Fname
Bugs
Yosemite
Daffy
Elmer
Witch
Lname
Bunny
Sam
Duck
Fudd
Hazel
State
NY
CA
NY
ME
MA
Zip
11217
95389
10013
04578
01970
Phone
(718) 938-3235
(209) 375-6572
(212) 227-1810
(207) 882-7323
(978) 744-0991
Age
34
52
35
43
57
Sex
M
M
M
M
F
6 Marvin Martian CA 91602 (818) 761-9964 26 M
26. Changing the table structure
Row oriented:
requires rebuilding of
the whole table
Column oriented:
Create new file for the
new column
Column-oriented is very flexible for adding columns, no need for a full rebuild
required with it.
Key Fname Lname State Zip Phone Age Sex Active
1 Bugs Bunny NY 11217 (718) 938-3235 34 M Y
2 Yosemite Sam CA 95389 (209) 375-6572 52 M N
3 Daffy Duck NY 10013 (212) 227-1810 35 M N
4 Elmer Fudd ME 04578 (207) 882-7323 43 M Y
5 Witch Hazel MA 01970 (978) 744-0991 57 F N
Key
1
2
3
4
5
Fname
Bugs
Yosemite
Daffy
Elmer
Witch
Lname
Bunny
Sam
Duck
Fudd
Hazel
State
NY
CA
NY
ME
MA
Zip
11217
95389
10013
04578
01970
Phone
(718) 938-3235
(209) 375-6572
(212) 227-1810
(207) 882-7323
(978) 744-0991
Age
34
52
35
43
57
Sex
M
M
M
M
F
Active
Y
N
N
Y
N
27. Easier Enterprise
Analytics
Single SQL Front-end
• Use a single SQL interface for analytics and OLTP
• Leverage MariaDB Security features - Encryption for data in motion, role based
access and auditing
Full ANSI SQL
• No more SQL “like” query
• Support complex join, aggregation and window function
Easy to manage and scale
• Eliminate needs for indexes and views
• Automated horizontal/vertical partitioning
• Linear scalable by adding new nodes as data grows
• Out of box connection with BI tools
• 90.3% cost reduction per TB per year
ANSI SQL
28. Faster, More
Efficient Queries
Parallel
Query Processing
Optimized for Columnar storage
• Columnar storage reduces disk I/O
• Blazing fast read-intensive workload
• Ultra fast data import
Parallel distributed query execution
• Distributed queries into series of parallel operations
• Fully parallel high speed data ingestion
• TPCH lineitem table - 750K to 1 million rows per second
Highly available analytic environment
• Built-in Redundancy
• Automatic fail-over
MariaDB AX customers across industries: Auto Parts, Finance, Ad analytics, Asset
management, Telecommunication, Healthcare, Digital Media, Carpooling App
29. MariaDB Analitycs
MariaDB MaxScale MariaDB MaxScale
MariaDB Server
ColumnStore
MariaDB Server
ColumnStore
MariaDB Server
ColumnStore
UM
User
Module
ColumnStore
Storage
ColumnStore
Storage
ColumnStore
Storage
ColumnStore
Storage
PM
(Performance Module)
• Clients connect to a User Module
• The User Module optimizes and
controls the execution
• Data is distributed among the
Performance Modules
• Data is stored, processed and
managed by Performance
Modules
• Performance Modules process
query primitives in parallel
• The User Module combines the
results from the Performance
Modules
Clients
30. Storage Architecture
Data is stored column by column
Each column is stored in one or more extents
Each extent is represented by 1 file
Each extent is arranged in fixed size blocks
Extents are compressed (using Snappy)
Data is one of
Fixed size (1, 2, 4 or 8 bytes)
Dictionary based with a fixed size pointer
Meta data is in an extent map
Extent map is in memory
Extent map contains meta data on each
extent, like min and max
Column 1
Extent 1 (8 million rows, 8MB~64MB)
Extent 2 (8 million rows)
Extent M (8 million rows)
Column 2 Column 3 ... Column N
Data automatically arranged by
• Column – Acts as Vertical Partitioning
• Extents – Acts as horizontal partition
Vertical
Partition
Horizontal
Partition
...
Vertical
Partition
Vertical
Partition
Vertical
Partition
Horizontal
Partition
Horizontal
Partition
31. High Performance Query Processing
Horizontal
Partition:
8 Million Rows
Extent 2
Horizontal
Partition:
8 Million Rows
Extent 3
Horizontal
Partition:
8 Million Rows
Extent 1
Storage Architecture reduces I/O
• Only touch column files
that are in filter, projection,
group by, and join conditions
• Eliminate disk block touches
to partitions outside filter
and join conditions
Extent 1:
ShipDate: 2016-01-12 - 2016-03-05
Extent 2:
ShipDate: 2016-03-05 - 2016-09-23
Extent 3:
ShipDate: 2016-09-24 - 2017-01-06
SELECT Item, sum(Quantity) FROM Orders
WHERE ShipDate between ‘2016-01-01’ and ‘2016-01-31’
GROUP BY Item
Id OrderId Line Item Quantity Price Supplier ShipDate ShipMode
1 1 1 Laptop 5 1000 Dell
2016-01-
12 G
2 1 2 Monitor 5 200 LG
2016-01-
13 G
3 2 1 Mouse 1 20 Logitech 2016-02-05 M
4 3 1 Laptop 3 1600 Apple 2016-01-31 P
... ... ... ... ... ... ... ... ...
8M 2016-03-05
8M+1 2016-03-05
... ... ... ... ... ... ... ... ...
16M 2016-09-23
16M+1 2016-09-24
... ... ... ... ... ... ... ... ...
24M 2017-01-06
ELIMINATED PARTITION
ELIMINATED PARTITION
37. Hybrid workloads: why scalability is needed
Applications have transactional
and analytical queries
1. Constrained by limited,
lightweight analytics
2. Need full analytics to
create competitive features
Outgrowing OLTP
Applications with lots of
customers, lots of transactions
1. Limited to current or recent
transaction data (months)
2. Need access to all
historical data (years)
Using historical data
SaaS customers are becoming
data-driven organizations
1. They don’t have access to
their own data
2. They need to analyze it in
unknown/unexpected ways
Exposing analytics