3. MariaDB ColumnStore
• GPLv2 Open Source
• Columnar, Massively Parallel
MariaDB Storage Engine
• Scalable, high-performance
analytics platform
• Built in redundancy and
high availability
• Runs on premise, on AWS cloud
• Full SQL syntax and capabilities
regardless of platform
Big Data Sources Analytics Insight
MariaDB ColumnStore
. . .
Node 1 Node 2 Node 3 Node N
Local / AWS®
/ GlusterFS ®
ELT
Tools
BI
Tools
4. MariaDB ColumnStore
High performance columnar storage engine that support wide variety of
analytical use cases with SQL in a highly scalable distributed environments
Parallel query
processing for
distributed
environments
Faster, More
Efficient Queries
Single SQL Interface
for OLTP and
analytics
Easier Enterprise
Analytics
Power of SQL and
Freedom of Open
Source to Big Data
Analytics
Better Price
Performance
5. OLTP/NoSQL
Workloads
Suited for reporting or analysis of millions-billions of rows from data sets containing millions-trillions of rows.
OLAP/Analytic/
Reporting Workloads
Workload – Query Vision/Scope
1 100 10,000
10-100GB
10,000,000,000
1-10TB
1,000,000 100,000,000
100-1,000GB
6. Row-oriented vs. Column-oriented format
• Row oriented
– Rows stored sequentially in
a file
– Scans through every record
row by row
• Column oriented:
– Each column is stored in a
separate file
– Scans only the relevant
columns
ID Fname Lname State Zip Phone Age Sex
1 Bugs Bunny NY 11217 (718) 938-3235 34 M
2 Yosemite Sam CA 95389 (209) 375-6572 52 M
3 Daffy Duck NY 10013 (212) 227-1810 35 M
4 Elmer Fudd ME 04578 (207) 882-7323 43 M
5 Witch Hazel MA 01970 (978) 744-0991 57 F
ID
1
2
3
4
5
Fname
Bugs
Yosemite
Daffy
Elmer
Witch
Lname
Bunny
Sam
Duck
Fudd
Hazel
State
NY
CA
NY
ME
MA
Zip
11217
95389
10013
04578
01970
Phone
(718) 938-3235
(209) 375-6572
(212) 227-1810
(207) 882-7323
(978) 744-0991
Age
34
52
35
43
57
Sex
M
M
M
M
F
SELECT Fname FROM People WHERE State = 'NY'
7. Analytics
• In-database distributed analytics with complex
join, aggregation, window functions
• Cross Engine Join allows for queries to be
executed referencing both columnstore and
non-columnstore tables.
• Extensible User Defined Functions allow
creation of specialized logic executed at PM
level.
• Standard MariaDB Connectors provide for out
of the box integration with:
– BI Tools (Tableau, Pentaho, ..)
– Custom Application Code (Java, Scala, C#,
Python, ..)
– Data Processing Frameworks (R, Spark,
Numpy, ..)
Item ID Server_date Revenue
1 2017-02-01 20,000.0
1 2017-02-02 5,001.00
2 2017-02-01 15,000.0
2 2017-02-04 34,029.0
2 2017-02-05 7,138.00
3 2017-02-01 17,250.0
3 2017-02-03 25,010.0
3 2017-02-04 21,034.0
3 2017-02-05 4,120.00
Running Average
20,000.00
12,500.50
15,000.00
34,029.00
20,583.50
17,250.00
25,010.00
23,022.00
12,577.00
Window Function Example: Daily Running Average Revenue by Item
SELECT item_id, server_date, daily_revenue,
AVG(revenue) OVER
(PARTITION BY item_id ORDER BY server_date
RANGE INTERVAL 1 DAY PRECEDING ) running_avg
FROM web_item_sales
BI Tool
Custom
Big Data App
Data
Processing
Framework
JDBC / ODBC / Connector
8. Enterprise Grade
• Enterprise Grade Security
– SSL, role based access, auditability.
– MaxScale database firewall
• Deployment Flexibility
– Run on commodity Linux servers on premise
or in the cloud.
– AWS optimized AMI Image.
– Add horizontal capacity as you grow.
• High Availability
– Automatic UM failover
– Automatic PM failover with distributed data
attachment across all PMs in SAN and EBS
environment
Shared-Nothing Distributed Data Storage
Compressed by default
User
Module
(UM)
Performance
Module
(PM)
Data Storage
Load
Balancer -
MaxScale
9. MariaDB ColumnStore Architecture
Columnar Distributed Data Storage
Local Storage | SAN | NAS | EBS | Gluster FS
BI Tool SQL Client Custom
Big Data App
Application
MariaDB
SQL Front
End
Distributed
Query Engine
Data Storage
10. MariaDB ColumnStore
Shared Nothing Distributed Data Storage
SQL
Column
Primitives
User
Module
Performance
Module
UM
PM
• Query received and parsed by
MariaDB Front End on UM
• Storage Engine Plugin breaks down query in
primitive operations and distributes across PM
• Primitives processed on PM
• One thread working on a range of rows
• Execute column restrictions and projections
• Execute group by/aggregation against local data
• Each PM work on Primitives in parallel
and fully distributed
• Each primitive executes in a fraction of a second
• Return intermediate results to UM
Primitives ↓↓↓↓
Intermediate
↑↑Results↑↑
13. Process Functionality Value
MariaDB
• Hosts MariaDB
• Connection management
• SQL parsing & optimization
✓ Familiar DBMS interface
✓ Leverages existing partner integrations
✓ Delivers rich SQL syntax support
Extent Map
• Abstracts physical
and logical storage
• Metadata store
✓ Enables partition elimination
ExeMgr
• Work distribution
• Final results management
and aggregation
✓ Multi-threaded to take advantage
of multi-core HW platforms
User Module at a Glance
14. Process Functionality Value
PrimProc
• Scale-out cache management
• Distributed scan, filter, join
and aggregation operations
• Resource management
✓ Independent scalability and
tunable performance
✓ Multi-threaded to take advantage
of multi-core HW platforms
Data
• High Speed Bulk Load
• Transactional DML and DDL
• Online schema extensions
✓ Non-blocking read enabled
✓ Multi-threaded to take advantage
of multi-core HW platforms
Performance Module at a Glance
15. Compression with Data Storage Layer
Blocks (8KB)
Extent1
(8MB~64MB
8 million rows)
Logical
Layer
Segment File1
(maps to an Extent)
Physical
Layer
Compression
Chunks
16. Key meta-structure that powers MariaDB ColumnStore’s
performance
A catalog of all extents
• Minimum and maximum values for a column’s data within an extent
• Corresponding blocks for each extent
Master copy of the Extent Map on primary PM node
Upon system startup, copied to all other UM and PM
nodes for disaster recovery and failover purposes
Extent Map resident in memory for quick access at all nodes
As extents modified, updates broadcasted to all participating nodes
Stores about 64 bytes for each 8-64 Mbytes on disk
Extent Map
17. Extent Map
When performing queries:
• Eliminate the extents by taking into consideration only
the extents for the column in join and filter conditions
• Use the minimum and maximum value for the extents for
join columns to filter the columns and eliminate extent
Multiple columns can be used
together for partition elimination
Transitive properties apply, i.e. a filter
on a dimension column (date, for example)
can allow for partition elimination on fact table
18. • 8-byte fixed length token (pointer).
• A variable length value stored at the
location identified by the pointer.
Data Types
1-byte Field
with 8192 values per
8k block
2-byte Field
with 4096 values
per 8k block
4-byte Field
with 2048 values
per 8k block
8-byte Field
with 1024 values per
8k block
Dictionary structure
made up of 2
files/extents with:
At the physical layer, all columns are stored as:
19. • Varchar(8) or larger
• Char(9) or larger
Data Types
1-byte Field
Examples
TinyInt, Char(1)
2-byte Field
Examples
SmallInt, Char(2)
4-byte Field
Examples
Int, Char(3),
Char(4), date, float
8-byte Field
Examples
BigInt,
Char(5-8),datetime,
real/double
Dictionary Examples
At the physical layer, all columns are stored as:
20. Sizing
Minimum Spec
UM
4 core,
32 G RAM PM
4 core,
16 G RAM
Typical Server spec
PM
8 core 64G RAM
UM
8 core, 64G RAM
Data Storage
External Data Volumes
• Maximum 2 data volume per IO
channel per PM node server
• up to 2TB on the disk per data
volume ≈ Max 4 TB per PM node
Local disk
Up to 2TB on the disk per
PM node server
DETAILED SIZING GUIDE
based on data size
and workload
21. Sizing - Example
• MariaDB ColumnStore 60TB uncompressed data =
6TB compressed data at 10x compression
• 2UM - 8 core 512G(based on work load)
• 6 TB compressed = 3 data volume (at 2TB per volume)
- with 1 data volume per PM node - 3PMs
• Data growth - 2TB per month, Data retention - 2 years
- Plan for 2TB X24 = 48 TB additional
- 48 TB = 4.8TB compressed ≈ 3 data volume(at 2TB per volume)
with 1 data volume per PM node - 3 additional PMs
• Total 6 PMs, 2 UMs
23. MAX RANK
MIN DENSE_RANK
COUNT PERCENT_RANK
SUM NTH_VALUE
AVG FIRST_VALUE
VARIANCE LAST_VALUE
VAR_POP CUME_DIST
VAR_SAMP LAG
STD LEAD
STDDEV NTILE
STDDEV_POP PERCENTILE_CONT
STDDEV_SAMP PERCENTILE_DISC
ROW_NUMBER MEDIAN
• Aggregate over a series of related rows
• Simplified function for complex statistical
analytics over sliding window per row
- Cumulative, moving or centered aggregates
- Simple Statistical functions like rank, max, min,
average, median
- More complex functions such as distribution,
percentile, lag, lead
- Without running complex sub-queries
Windowing Functions
Source : InfiniDB SQL Syntax Guide
24. Top N Visitors for each Month
Window Function Example
Total for Each
Visitor by Month
Top 1 :
Time_rank = 1
Top 2 :
Time_rank <= 2
Top N :
Time_rank <= N
25. Data Modeling Best Practices
Star-schema optimizations are generally a good idea
Conservative data typing is very important
Especially around fixed-length vs. dictionary boundary (8 bytes)
IP Address vs. IP Number
Break down compound fields into individual fields:
Trivializes searching for sub-fields
Can avoid dictionary overhead
Cost to re-assemble is generally small
26. Horizontal
Partition:
8 Million Rows
Extent 2
Horizontal
Partition:
8 Million Rows
Extent 3
Horizontal
Partition:
8 Million Rows
Extent 1
Storage Architecture reduces I/O
• Only touch column files
that are in filter, projection,
group by, and join conditions
• Eliminate disk block touches
to partitions outside filter
and join conditions
Extent 1:
ShipDate: 2016-01-12 - 2016-03-05
Extent 2:
ShipDate: 2016-03-05 - 2016-09-23
Extent 3:
ShipDate: 2016-09-24 - 2017-01-06
SELECT Item, sum(Quantity) FROM Orders
WHERE ShipDate between ‘2016-01-01’ and ‘2016-01-31’
GROUP BY Item
Extent Elimination
Id OrderId Line Item Quantity Price Supplier ShipDate ShipMode
1 1 1 Laptop 5 1000 Dell 2016-01-12 G
2 1 2 Monitor 5 200 LG 2016-01-13 G
3 2 1 Mouse 1 20 Logitech 2016-02-05 M
4 3 1 Laptop 3 1600 Apple 2016-01-31 P
... ... ... ... ... ... ... ... ...
8M 2016-03-05
8M+1 2016-03-05
... ... ... ... ... ... ... ... ...
16M 2016-09-23
16M+1 2016-09-24
... ... ... ... ... ... ... ... ...
24M 2017-01-06
ELIMINATED PARTITION
ELIMINATED PARTITION
27. Tuning Commands
MariaDB [test]> select count(*) from t1 where i = 5;
+----------+
| count(*) |
+----------+
| 2200000 |
+----------+
1 row in set (0.27 sec)
MariaDB [test]> select calGetStats()G
*************************** 1. row ***************************
calGetStats(): Query Stats: MaxMemPct-0; NumTempFiles-0; TempFileSpace-0B;
ApproxPhyI/O-11042; CacheI/O-11042; BlocksTouched-11042; PartitionBlocksEliminated-0;
MsgBytesIn-332KB; MsgBytesOut-3KB; Mode-Distributed
1 row in set (0.00 sec)
calGetStats: Information On The Last Query Executed Within A Given Session
28. MariaDB [test]> select calSetTrace(1);
+----------------+
| calSetTrace(1) |
+----------------+
| 0 |
+----------------+
1 row in set (0.00 sec)
MariaDB [test]> select d.name dept_name,
-> count(*) emp_count,
-> sum(e.salary) salary_cost
-> from emp e
-> join i_dept d on e.dept_id = d.dept_id
-> group by dept_name;
+-------------+-----------+-------------+
| dept_name | emp_count | salary_cost |
+-------------+-----------+-------------+
| Engineering | 2 | 2500 |
| Sales | 2 | 3800 |
+-------------+-----------+-------------+
2 rows in set, 1 warning (0.03 sec)
Tuning Commands
calGetTrace: Detailed distributed query execution plan
MariaDB [test]> select calGetTrace()G
*************************** 1. row ***************************
calGetTrace():
Desc Mode Table TableOID ReferencedColumns PIO LIO PBE Elapsed Rows
CES UM - - - - - - 0.000 2
BPS PM e 3013 (dept_id,salary) 2 4 0 0.008 2
HJS PM e-d 3013 - - - - ----- -
TAS UM - - - - - - 0.000 2
1 row in set (0.00 sec)
29. Tuning Commands
Query Statistics
Users can view the query statistics by selecting the rows from the
query stats table in the infinidb_querystats schema.
Example 1 Example 2
List execution time, rows returned
for all the select queries within
the past 12 hours
select queryid, query, endtime-starttime,
rows from querystats where starttime >=
now() - interval 12 hour and querytype =
'SELECT';
List the average, min and max
running time of all the INSERT SELECT
queries within the past 12 hours
select min(endtime-starttime), max(endtime-starttime),
avg(endtimestarttime) from querystats where
querytype='INSERT SELECT' and starttime >=
now() - interval 12 hour;
30. calpont> getActiveSQLStatements
getactivesqlstatements Wed Oct 7 08:38:32 2015
Get List of Active SQL Statements
=================================
Start Time Time (hh:mm:ss) Session ID SQL Statement
---------------- ---------------- -------------------- -----------------------------------------------------------
Oct 7 08:38:30 00:00:03 73 select c_name,sum(lo_revenue) from customer, lineorder where
lo_custkey = c_custkey andc_custkey = 6 group by c_name
getActiveSQLStatements: List Active SQL Statements within the System
mysql> show processlist;
+----+------+-----------+-------+---------+------+-------+------------------+
| Id | User | Host | db | Command | Time | State | Info |
+----+------+-----------+-------+---------+------+-------+------------------+
| 73 | root | localhost | ssb10 | Query | 0 | NULL | show processlist |
+----+------+-----------+-------+---------+------+-------+------------------+
Tuning Commands
33. Bulk Data Load: cpimport
• Fastest way to load data into MariaDB ColumnStore
• Load data from CSV file
cpimport dbName tblName [loadFile]
• Load data from Standard Input
mysql -e 'select * from source_table;' -N db2 | cpimport destination_db
destination_tbl -s 't‘
• Load data from Binary Source file
cpimport -I1 mydb mytable sourcefile.bin
• Multiple tables in can be loaded in parallel by launching multiple jobs
• Read queries continue without being blocked
• Successful cpimport is auto-committed
• In case of errors, entire load is rolled back
34. Bulk Data Load: cpimport mode 1
Single file Central Input :
Data source at UM
cpimport -m1 mytest mytable
mytable.tbl
cpimport
Name Node
UM Node
Source
Data Node
PM Node
Data Node
PM Node
Data Node
PM Node
35. Bulk Data Load: cpimport mode 2
Distributed Input:
Data Source at PMs
Partitioned load
file on each PM
cpimport -m2 testdb mytable
/home/mydata/mytable.tbl
cpimport
Name Node
UM Node
Source
Data Node
PM Node
Data Node
PM Node
Data Node
PM Node
Source Source
36. Distributed Input:
Data Source at PMs
Partitioned load
file on each PM
cpimport -m2 testdb mytable
/home/mydata/mytable.tbl
Bulk load command
at one or more PM
cpimport –m3 testdb mytable
/home/mydata/mytable.tbl
Bulk Data Load: cpimport mode 3
Name Node
UM Node
Source
Data Node
PM Node
Data Node
PM Node
Data Node
PM Node
Source Source
cpimport cpimport cpimport
37. Traditional way of
importing data into
any MariaDB storage
engine table
Bulk Data Load:
LOAD DATA INFILE
Up to 2 times slower
than cpimport for
large size imports
mysql> load data infile '/tmp/
outfile1.txt' into table destinationTable;
Query OK, 9765625 rows affected
(2 min 20.01 sec)
Records: 9765625 Deleted:
0 Skipped: 0 Warnings: 0
Either success or
error operation can
be rolled back
38. • Connect with ODBC, JDBC or
mysql client to the UM
• Extract SQL query results in
output file on the UM
Bulk Data Export
Distributed Export Central Export
• Fastest way to do export
• Use LOCAL PM query feature
• Connect ODBC, JDBC or mysql
client to each PM
• Extract SQL query results in
output file on each PM
40. Data Warehousing
Selective column
based queries
Large number
of dimensions
High Performance
Analytics On Large
Volume Of Data
Reporting and analysis
on millions or billions
of rows
From datasets
containing millions
to trillions of rows
Terabytes to Petabytes
of datasets
Analytics Require
Complex Joins,
Windowing Functions
Technical Use Cases
41. Industry Category Use Case
Gaming Behavior Analytics Projecting and predicting user behavior based on past and current data
Advertising Customer Analytics Customer behavior data for market segmentation and predictive analytics.
Advertising Loyalty Analytics Customer analytics focusing on a person’s commitment to a product, company, or brand.
Web,
E-commerce
Click Stream Analytics
Web activity analysis, software testing, market research with analytics on data about the clicks areas of web pages while
web browsing [Deal News]
Marketing Promotional Testing Using marketing and campaign management data to identify the best criteria to be used for a particular marketing offer.
Social Network Network Analytics Relationship analytics among network nodes
Financial Fraud Analytics
Monitoring user financial transactions and identifying patterns of behaviour to predict and detect abnormal or fraudulent
activity to prevent damage to user and institution.
Healthcare Patient Analytics Analyzing patient medical records to identify patterns to be used for improved medical treatment.
Healthcare Clinical Analytics Analyzing clinical data and its impact on patients to identify patterns to be used for improved medical treatment.
Telco
Network and Application
Performance Analytics
Streaming data from network devices and applications enriched with business operations data to uncover actionable
insights for network planning, operations and marketing analytics
Aviation Flight analytics
Proactively project parts replacement, maintenance and air-plane retirement based on real-time and historically collected
flight parameter data [Boeing]
Customer Use Cases
42. Coming Soon - ColumnStore 1.1
● Text / Blob datatype support
● Bulk Write API Connector
○ Kafka integration
○ Replication integration
○ Custom
● User Defined Aggregate & Window functions.
● Data Redundancy for local storage.
● Installation improvements.
● Performance & stability improvements.
● MariaDB Server 10.2