MariaDB ColumnStore - LONDON MySQL Meetup

ColumnStore
OpenSource Engine for
Analytics/BI
Bruno Šimić
Solutions Engineer

Data and big data
OLTP vs. OLAP

OLTP
On-line Transaction Processing
• large number of short on-line
transactions (INSERT, UPDATE,
DELETE)
• very fast query processing, maintaining
data integrity in multi-access
environments
• effectiveness measured by number of
transactions per second
• operational (detailed and current) data
• OLTPs are the original data source
OLAP
On-line Analytical Processing
• characterized by low volume of
concurrent transactions
• complex queries, often involving
aggregations
• response time is effectiveness measure
• data is aggregated, historical and stored
in multi-dimensional schemas
• OLAP data comes from the various
OLTP Databases
Rows/DataSize Scope
1 100 10,000 1,000,000 100,000,000 10,000,000,000 100,000,000,000
10-100GB 100-1000GB 1-10TB 10-100TB...PB
MariaDB OLTP MariaDB ColumnStore OLAP

1. Descriptive Analytics
What is Happening?
Traditional OLAP
2. Diagnostic Analytics
Why did it Happen?
3. Predictive Analytics
What is likely to happen?
4. Prescriptive Analytics
What should I do about it?
Big Data Analytics
Social Media
Sensors
Node 1
Biometrics
Mobile
Data Collection
MariaDB
ColumnStore
Data Processing
BI Tools, Data Science
Applications
Connectors,
SPARK Integration etc
MariaDB
MaxScale
Transactional,
Operational
Analytics
Insight

MariaDB ColumnStore Architecture
Columnar Distributed Data Storage
Local Storage | SAN | EBS | Gluster FS
BI Tool SQL Client Custom
Big Data
App
Application
MariaDB SQL
Front End
Distributed
Query Engine
Data
Storage
User Module (UM)
Performance
Module (PM)

User Modules
• mysqld - The MariaDB server
• ExeMgr - MariaDB’s interface to
ColumnStore
• cpimport - high-performance data
import
Query Processing - UM
• SQL Operations are translated into
thousands of Primitives
• Parallel/Distributed SQL
• Extensible with Parallel/Distributed
UDFs
• Query is parsed by mysqld on UM node
• Parsed query handed over to ExeMgr
on UM node
• ExecMgr breaks down the query in
primitive operations
MariaDB SQL
Front End
User Module (UM)

Performance Modules
• PrimProc - Primitives Processor
• WriteEngineServ - Database file writing
processor
• DMLProc - DML writes processor
• DDLProc - DDL processor
Query Processing - PM
• Primitives are processed on PM
• One thread working on a range of rows
• Typically 1/2 million rows, stored in a
few hundred blocks of data
• Execute all column operations required
(restriction and projection)
• Execute any group by/aggregation
against local data
• Each primitive executes in a fraction of
a second
• Primitives are run in parallel and fully
distributed
Distributed
Query Engine
Performance
Module (PM)

Storage Architecture
• Columnar storage
– Each column stored as separate file
– No index management for query
performance tuning
– Online Schema changes: Add new column
without impacting running queries
• Automatic horizontal partitioning
– Logical partition every 8 Million rows
– In memory metadata of partition min and max
– No partition management for query
performance tuning
• Compression
– Accelerate decompression rate
– Reduce I/O for compressed blocks
Column 1
Extent 1 (8 million rows, 8MB～64MB)
Extent 2 (8 million rows)
Extent M (8 million rows)
Column 2 Column 3 ... Column N
Data automatically arranged by
• Column – Acts as Vertical Partitioning
• Extents – Acts as horizontal partition
Vertical
Partition
Horizontal
Partition
...
Vertical
Partition
Vertical
Partition
Vertical
Partition
Horizontal
Partition
Horizontal
Partition

High Performance Data Ingestion
• Fully parallel high
speed data load
– Parallel data loads on all PMs simultaneously
– Multiple tables in can be loaded simultaneously
– Read queries continue without being blocked
• Micro-batch loading
for real-time data flow
Column 1
Extent 1 (8 million rows, 8MB～64MB)
Extent 2 (8 million rows)
Extent M (8 million rows)
Column 2 ... Column N
Horizontal
Partition
...
Horizontal
Partition
Horizontal
Partition
High Water Mark
New Data being loaded
Dataaccessedby
runningqueries

Column-oriented storage
Differences to row-oriented storage

Row oriented:
rows stored
sequentially in a file.
Column oriented:
Each column is stored
in a separate file. Each
column for a given
row is at the same
offset.
Row-oriented vs. Column-oriented format
Key Fname Lname State Zip Phone Age Sex
1 Bugs Bunny NY 11217 (718) 938-3235 34 M
2 Yosemite Sam CA 95389 (209) 375-6572 52 M
3 Daffy Duck NY 10013 (212) 227-1810 35 M
4 Elmer Fudd ME 04578 (207) 882-7323 43 M
5 Witch Hazel MA 01970 (978) 744-0991 57 F
Key
1
2
3
4
5
Fname
Bugs
Yosemite
Daffy
Elmer
Witch
Lname
Bunny
Sam
Duck
Fudd
Hazel
State
NY
CA
NY
ME
MA
Zip
11217
95389
10013
04578
01970
Phone
(718) 938-3235
(209) 375-6572
(212) 227-1810
(207) 882-7323
(978) 744-0991
Age
34
52
35
43
57
Sex
M
M
M
M
F

Row oriented:
new rows appended to
the end.
Column oriented:
new value added to
each file.
Single-Row Operations - Insert
1 Bugs Bunny NY 11217 (718) 938-3235 34 M
2 Yosemite Sam CA 95389 (209) 375-6572 52 M
3 Daffy Duck NY 10013 (212) 227-1810 35 M
4 Elmer Fudd ME 04578 (207) 882-7323 43 M
5 Witch Hazel MA 01970 (978) 744-0991 57 F
6 Marvin Martian CA 91602 (818) 761-9964 26 M
Key
1
2
3
4
5
Fname
Bugs
Yosemite
Daffy
Elmer
Witch
Lname
Bunny
Sam
Duck
Fudd
Hazel
State
NY
CA
NY
ME
MA
Zip
11217
95389
10013
04578
01970
Phone
(718) 938-3235
(209) 375-6572
(212) 227-1810
(207) 882-7323
(978) 744-0991
Age
34
52
35
43
57
Sex
M
M
M
M
F
Columnar insert not efficient for singleton insertions (OLTP). Batch loads touches
row vs. column. Batch load on column-oriented is faster (compression, no indexes).

Row oriented:
new rows deleted
Column oriented:
value deleted from
each file
Single-Row Operations - Delete
1 Bugs Bunny NY 11217 (718) 938-3235 34 M
2 Yosemite Sam CA 95389 (209) 375-6572 52 M
3 Daffy Duck NY 10013 (212) 227-1810 35 M
4 Elmer Fudd ME 04578 (207) 882-7323 43 M
5 Witch Hazel MA 01970 (978) 744-0991 57 F
Key
1
2
3
4
5
Fname
Bugs
Yosemite
Daffy
Elmer
Witch
Lname
Bunny
Sam
Duck
Fudd
Hazel
State
NY
CA
NY
ME
MA
Zip
11217
95389
10013
04578
01970
Phone
(718) 938-3235
(209) 375-6572
(212) 227-1810
(207) 882-7323
(978) 744-0991
Age
34
52
35
43
57
Sex
M
M
M
M
F
Recommended Partition Drop to allow dropping columns in bulk.

Row oriented:
Update 100% of rows
means change 100%
of blocks on disk.
Column oriented:
Just update the blocks
needed to be updated
Single-Row Operations - Update
1 Bugs Bunny NY 11217 (718) 938-3235 34 M
2 Yosemite Sam CA 95389 (209) 375-6572 52 M
3 Daffy Duck NY 10013 (212) 227-1810 35 M
4 Elmer Fudd ME 04578 (207) 882-7323 43 M
5 Witch Hazel MA 01970 (978) 744-0991 57 F
Key
1
2
3
4
5
Fname
Bugs
Yosemite
Daffy
Elmer
Witch
Lname
Bunny
Sam
Duck
Fudd
Hazel
State
NY
CA
NY
ME
MA
Zip
11217
95389
10013
04578
01970
Phone
(718) 938-3235
(209) 375-6572
(212) 227-1810
(207) 882-7323
(978) 744-0991
Age
34
52
35
43
57
Sex
M
M
M
M
F

Row oriented:
requires rebuilding of
the whole table
Column oriented:
Create new file for the
new column
Changing the table structure
Key Fname Lname State Zip Phone Age Sex Active
1 Bugs Bunny NY 11217 (718) 938-3235 34 M Y
2 Yosemite Sam CA 95389 (209) 375-6572 52 M N
3 Daffy Duck NY 10013 (212) 227-1810 35 M N
4 Elmer Fudd ME 04578 (207) 882-7323 43 M Y
5 Witch Hazel MA 01970 (978) 744-0991 57 F N
Key
1
2
3
4
5
Fname
Bugs
Yosemite
Daffy
Elmer
Witch
Lname
Bunny
Sam
Duck
Fudd
Hazel
State
NY
CA
NY
ME
MA
Zip
11217
95389
10013
04578
01970
Phone
(718) 938-3235
(209) 375-6572
(212) 227-1810
(207) 882-7323
(978) 744-0991
Age
34
52
35
43
57
Sex
M
M
M
M
F
Active
Y
N
N
Y
N
Column-oriented is very flexible for adding columns, no need for a full rebuild
required with it.

Horizontal
Partition:
8 Million Rows
Extent 2
Horizontal
Partition:
8 Million Rows
Extent 3
Horizontal
Partition:
8 Million Rows
Extent 1
Storage Architecture reduces I/O
• Only touch column files
that are in projection, filter
and join conditions
• Eliminate disk block touches
to partitions outside filter
and join conditions
Extent 1:
Min State: CA, Max State: NY
Extent 2:
Min State: OR, Max State: WY
Extent 3:
Min State: IA, Max State: TN
SELECT Fname FROM Table 1 WHERE State = ‘NY’
High Performance Query Processing
ID
1
2
3
4
...
8M
8M+1
...
16M
16M+1
...
24M
Fname
Bugs
Yosemite
Daffy
Hazel
...
...
Jane
...
Elmer
Lname
Bunny
Sam
Duck
Fudd
...
...
...
State
NY
CA
NY
ME
...
MN
WY
TX
OR
...
VA
TN
IA
NY
...
PA
Zip
11217
95389
10013
04578
...
...
...
Phone
(718) 938-3235
(209) 375-6572
(212) 227-1810
(207) 882-7323
...
...
...
Age
34
52
35
43
...
...
...
Sex
M
M
M
F
...
...
...
Vertical
Partition
Vertical
Partition
Vertical
Partition
Vertical
Partition
Vertical
Partition
…
ELIMINATED PARTITION

Analytics
– In-database distributed analytics with complex
join, aggregation, window functions
– Extensible UDF for custom analytics
– Cross Engine Join with other storage engines
Window functions
– PARTITION BY / ORDER BY
– Aggregate functions: MAX, MIN, COUNT,
SUM, AVG STD, STDDEV_SAMP,
STDDEV_POP, VAR_SAMP, VAR_POP
– Ranking: ROW_NUMBER, RANK,
DENSE_RANK, PERCENT_RANK
CUME_DIST, NTILE, PERCENTILE,
PERCENTILE_CONT
PERCENTILE_DISC, MEDIAN
Daily Running Average Revenue for each item
SELECT item_id, server_date, daily_revenue,
AVG(revenue) OVER
(PARTITION BY item_id ORDER BY server_date
RANGE INTERVAL '1' DAY PRECEDING ) running_avg
FROM web_item_sales
Item ID Server_date Revenue
1 02-01-2014 20,000.00
1 02-02-2014 5,001.00
2 02-01-2014 15,000.00
2 02-04-2014 34,029.00
2 02-05-2014 7,138.00
3 02-01-2014 17,250.00
3 02-03-2014 25,010.00
3 02-04-2014 21,034.00
3 02-05-2014 4,120.00
Running Average
20,000.00
12,500.50
15,000.00
34,209.00
20,583.50
17,250.00
250,100.00
12,577.00
20,583.50

MariaDB ColumnStore
Best practices

General
• Not suited for OLTP, needs big data to
process fast (millions of records)
• Micro-batch load allows near real-time
behaviour
• Infrequently used columns do not
impact other queries
• Columnar suitable for sparse columns
(nulls compress nicely)
Query Modeling
• Star-schema optimizations are
generally a good idea
• Conservative data typing is important
– fixed-length vs. dictionary boundary (8
bytes)
– IP Address vs. IP Number
• Break down compound fields into
individual fields
– Trivializes searching for sub-fields
– Can avoid dictionary overhead
– Cost to re-assemble is generally small
Best Practices

Cpimport
• Fastest way to load data from CSV file,
standard input, binary source file
• Multiple tables in can be loaded in
parallel by launching multiple jobs
• Read queries continue without being
blocked
• Successful cpimport is auto-committed
• In case of errors, entire load is rolled
back
LOAD DATA INFILE
• Traditional way of importing data into
any MariaDB storage engine table
• Up to 2 times slower than cpimport for
large size imports
• Either success or error operation can be
rolled back
Data Ingestion

HA at UM node
• When one UM node goes down, another
UM node takes over
HA at Data Storage
• AWS EBS (Elastic Block Store)
• GlusterFS - Multiple copy of data block
across storage. If a disk on a PM node
fails, another PM node will have access
to the copy of the data
High Availability
HA at PM node
• SAN/AWS EBS - When a PM node
goes down, the data volumes
attached to the failed PM node gets
attached to another PM
• Local Disks -If a PM node goes down,
the data on its disks are not available,
though queries continue on the
remaining data set

Where to find MariaDB ColumnStore?
SOFTWARE DOWNLOAD https://mariadb.com/downloads/columnstore
SOURCE https://github.com/mariadb-corporation/mariadb-columnstore-engine
DOCUMENTATION https://mariadb.com/kb/en/mariadb/mariadb-columnstore/
BLOGS https://mariadb.com/blog-tags/columnstore
</>

Thank you
Bruno Šimić
bruno@mariadb.com

MariaDB ColumnStore - LONDON MySQL Meetup

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (6)

Similar to MariaDB ColumnStore - LONDON MySQL Meetup

Similar to MariaDB ColumnStore - LONDON MySQL Meetup (20)

More from Ivan Zoratti

More from Ivan Zoratti (20)

Recently uploaded

Recently uploaded (20)

MariaDB ColumnStore - LONDON MySQL Meetup