SlideShare a Scribd company logo
1 of 43
Download to read offline
MariaDB
ColumnStore
Andrew Hutchings (LinuxJedi)
Senior Software Engineer
Overview Architecture
Columnar
Distributed
Storage
MariaDB ColumnStore
• GPLv2 Open Source
• Columnar, Massively Parallel
MariaDB Storage Engine
• Scalable, high-performance
analytics platform
• Built in redundancy and
high availability
• Runs on premise, on AWS cloud
• Full SQL syntax and capabilities
regardless of platform
Big Data Sources Analytics Insight
MariaDB ColumnStore
. . .
Node 1 Node 2 Node 3 Node N
Local / AWS®
/ GlusterFS ®
ELT
Tools
BI
Tools
MariaDB ColumnStore
High performance columnar storage engine that support wide variety of
analytical use cases with SQL in a highly scalable distributed environments
Parallel query
processing for
distributed
environments
Faster, More
Efficient Queries
Single SQL Interface
for OLTP and
analytics
Easier Enterprise
Analytics
Power of SQL and
Freedom of Open
Source to Big Data
Analytics
Better Price
Performance
OLTP/NoSQL
Workloads
Suited for reporting or analysis of millions-billions of rows from data sets containing millions-trillions of rows.
OLAP/Analytic/
Reporting Workloads
Workload – Query Vision/Scope
1 100 10,000
10-100GB
10,000,000,000
1-10TB
1,000,000 100,000,000
100-1,000GB
Row-oriented vs. Column-oriented format
• Row oriented
– Rows stored sequentially in
a file
– Scans through every record
row by row
• Column oriented:
– Each column is stored in a
separate file
– Scans only the relevant
columns
ID Fname Lname State Zip Phone Age Sex
1 Bugs Bunny NY 11217 (718) 938-3235 34 M
2 Yosemite Sam CA 95389 (209) 375-6572 52 M
3 Daffy Duck NY 10013 (212) 227-1810 35 M
4 Elmer Fudd ME 04578 (207) 882-7323 43 M
5 Witch Hazel MA 01970 (978) 744-0991 57 F
ID
1
2
3
4
5
Fname
Bugs
Yosemite
Daffy
Elmer
Witch
Lname
Bunny
Sam
Duck
Fudd
Hazel
State
NY
CA
NY
ME
MA
Zip
11217
95389
10013
04578
01970
Phone
(718) 938-3235
(209) 375-6572
(212) 227-1810
(207) 882-7323
(978) 744-0991
Age
34
52
35
43
57
Sex
M
M
M
M
F
SELECT Fname FROM People WHERE State = 'NY'
Analytics
• In-database distributed analytics with complex
join, aggregation, window functions
• Cross Engine Join allows for queries to be
executed referencing both columnstore and
non-columnstore tables.
• Extensible User Defined Functions allow
creation of specialized logic executed at PM
level.
• Standard MariaDB Connectors provide for out
of the box integration with:
– BI Tools (Tableau, Pentaho, ..)
– Custom Application Code (Java, Scala, C#,
Python, ..)
– Data Processing Frameworks (R, Spark,
Numpy, ..)
Item ID Server_date Revenue
1 2017-02-01 20,000.0
1 2017-02-02 5,001.00
2 2017-02-01 15,000.0
2 2017-02-04 34,029.0
2 2017-02-05 7,138.00
3 2017-02-01 17,250.0
3 2017-02-03 25,010.0
3 2017-02-04 21,034.0
3 2017-02-05 4,120.00
Running Average
20,000.00
12,500.50
15,000.00
34,029.00
20,583.50
17,250.00
25,010.00
23,022.00
12,577.00
Window Function Example: Daily Running Average Revenue by Item
SELECT item_id, server_date, daily_revenue,
AVG(revenue) OVER
(PARTITION BY item_id ORDER BY server_date
RANGE INTERVAL 1 DAY PRECEDING ) running_avg
FROM web_item_sales
BI Tool
Custom
Big Data App
Data
Processing
Framework
JDBC / ODBC / Connector
Enterprise Grade
• Enterprise Grade Security
– SSL, role based access, auditability.
– MaxScale database firewall
• Deployment Flexibility
– Run on commodity Linux servers on premise
or in the cloud.
– AWS optimized AMI Image.
– Add horizontal capacity as you grow.
• High Availability
– Automatic UM failover
– Automatic PM failover with distributed data
attachment across all PMs in SAN and EBS
environment
Shared-Nothing Distributed Data Storage
Compressed by default
User
Module
(UM)
Performance
Module
(PM)
Data Storage
Load
Balancer -
MaxScale
MariaDB ColumnStore Architecture
Columnar Distributed Data Storage
Local Storage | SAN | NAS | EBS | Gluster FS
BI Tool SQL Client Custom
Big Data App
Application
MariaDB
SQL Front
End
Distributed
Query Engine
Data Storage
MariaDB ColumnStore
Shared Nothing Distributed Data Storage
SQL
Column
Primitives
User
Module
Performance
Module
UM
PM
• Query received and parsed by
MariaDB Front End on UM
• Storage Engine Plugin breaks down query in
primitive operations and distributes across PM
• Primitives processed on PM
• One thread working on a range of rows
• Execute column restrictions and projections
• Execute group by/aggregation against local data
• Each PM work on Primitives in parallel
and fully distributed
• Each primitive executes in a fraction of a second
• Return intermediate results to UM
Primitives ↓↓↓↓
Intermediate
↑↑Results↑↑
MariaDB ColumnStore
MariaDB ColumnStore
uses standard
“Engine=columnstore”
syntax
mysql> use tpcds_djoshi
Database changed
mysql> select count(*) from store_sales;
+----------+
| count(*) |
+----------+
| 2880404 |
+----------+
1 row in set (1.68 sec)
mysql> describe warehouse;
+-------------------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------------------+--------------+------+-----+---------+-------+
| w_warehouse_sk | int(11) | NO | | NULL | |
| w_warehouse_id | char(16) | NO | | NULL | |
| w_warehouse_name | varchar(20) | YES | | NULL | |
| w_warehouse_sq_ft | int(11) | YES | | NULL | |
| w_street_number | char(10) | YES | | NULL | |
| w_street_name | varchar(60) | YES | | NULL | |
| w_street_type | char(15) | YES | | NULL | |
| w_suite_number | char(10) | YES | | NULL | |
| w_city | varchar(60) | YES | | NULL | |
| w_county | varchar(30) | YES | | NULL | |
| w_state | char(2) | YES | | NULL | |
| w_zip | char(10) | YES | | NULL | |
| w_country | varchar(20) | YES | | NULL | |
| w_gmt_offset | decimal(5,2) | YES | | NULL | |
+-------------------+--------------+------+-----+---------+-------+
14 rows in set (0.05 sec)
CREATE TABLE `game_warehouse`.`dim_title` (
`id` INT,
`name` VARCHAR(45),
`publisher` VARCHAR(45),
`release_date` DATE,
`language` INT,
`platform_name` VARCHAR(45),
`version` VARCHAR(45)
) ENGINE=columnstore;
Uses custom scalable
columnar architecture
MariaDB ColumnStore
mysql> use tpcds_djoshi
Database changed
mysql> select count(*) from store_sales;
+----------+
| count(*) |
+----------+
| 2880404 |
+----------+
1 row in set (1.68 sec)
mysql> describe warehouse;
+-------------------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------------------+--------------+------+-----+---------+-------+
| w_warehouse_sk | int(11) | NO | | NULL | |
| w_warehouse_id | char(16) | NO | | NULL | |
| w_warehouse_name | varchar(20) | YES | | NULL | |
| w_warehouse_sq_ft | int(11) | YES | | NULL | |
| w_street_number | char(10) | YES | | NULL | |
| w_street_name | varchar(60) | YES | | NULL | |
| w_street_type | char(15) | YES | | NULL | |
| w_suite_number | char(10) | YES | | NULL | |
| w_city | varchar(60) | YES | | NULL | |
| w_county | varchar(30) | YES | | NULL | |
| w_state | char(2) | YES | | NULL | |
| w_zip | char(10) | YES | | NULL | |
| w_country | varchar(20) | YES | | NULL | |
| w_gmt_offset | decimal(5,2) | YES | | NULL | |
+-------------------+--------------+------+-----+---------+-------+
14 rows in set (0.05 sec)
MariaDB Front End
Standard ANSI SQL
Process Functionality Value
MariaDB
• Hosts MariaDB
• Connection management
• SQL parsing & optimization
✓ Familiar DBMS interface
✓ Leverages existing partner integrations
✓ Delivers rich SQL syntax support
Extent Map
• Abstracts physical
and logical storage
• Metadata store
✓ Enables partition elimination
ExeMgr
• Work distribution
• Final results management
and aggregation
✓ Multi-threaded to take advantage
of multi-core HW platforms
User Module at a Glance
Process Functionality Value
PrimProc
• Scale-out cache management
• Distributed scan, filter, join
and aggregation operations
• Resource management
✓ Independent scalability and
tunable performance
✓ Multi-threaded to take advantage
of multi-core HW platforms
Data
• High Speed Bulk Load
• Transactional DML and DDL
• Online schema extensions
✓ Non-blocking read enabled
✓ Multi-threaded to take advantage
of multi-core HW platforms
Performance Module at a Glance
Compression with Data Storage Layer
Blocks (8KB)
Extent1
(8MB~64MB
8 million rows)
Logical
Layer
Segment File1
(maps to an Extent)
Physical
Layer
Compression
Chunks
Key meta-structure that powers MariaDB ColumnStore’s
performance
A catalog of all extents
• Minimum and maximum values for a column’s data within an extent
• Corresponding blocks for each extent
Master copy of the Extent Map on primary PM node
Upon system startup, copied to all other UM and PM
nodes for disaster recovery and failover purposes
Extent Map resident in memory for quick access at all nodes
As extents modified, updates broadcasted to all participating nodes
Stores about 64 bytes for each 8-64 Mbytes on disk
Extent Map
Extent Map
When performing queries:
• Eliminate the extents by taking into consideration only
the extents for the column in join and filter conditions
• Use the minimum and maximum value for the extents for
join columns to filter the columns and eliminate extent
Multiple columns can be used
together for partition elimination
Transitive properties apply, i.e. a filter
on a dimension column (date, for example)
can allow for partition elimination on fact table
• 8-byte fixed length token (pointer).
• A variable length value stored at the
location identified by the pointer.
Data Types
1-byte Field
with 8192 values per
8k block
2-byte Field
with 4096 values
per 8k block
4-byte Field
with 2048 values
per 8k block
8-byte Field
with 1024 values per
8k block
Dictionary structure
made up of 2
files/extents with:
At the physical layer, all columns are stored as:
• Varchar(8) or larger
• Char(9) or larger
Data Types
1-byte Field
Examples
TinyInt, Char(1)
2-byte Field
Examples
SmallInt, Char(2)
4-byte Field
Examples
Int, Char(3),
Char(4), date, float
8-byte Field
Examples
BigInt,
Char(5-8),datetime,
real/double
Dictionary Examples
At the physical layer, all columns are stored as:
Sizing
Minimum Spec
UM
4 core,
32 G RAM PM
4 core,
16 G RAM
Typical Server spec
PM
8 core 64G RAM
UM
8 core, 64G RAM
Data Storage
External Data Volumes
• Maximum 2 data volume per IO
channel per PM node server
• up to 2TB on the disk per data
volume ≈ Max 4 TB per PM node
Local disk
Up to 2TB on the disk per
PM node server
DETAILED SIZING GUIDE
based on data size
and workload
Sizing - Example
• MariaDB ColumnStore 60TB uncompressed data =
6TB compressed data at 10x compression
• 2UM - 8 core 512G(based on work load)
• 6 TB compressed = 3 data volume (at 2TB per volume)
- with 1 data volume per PM node - 3PMs
• Data growth - 2TB per month, Data retention - 2 years
- Plan for 2TB X24 = 48 TB additional
- 48 TB = 4.8TB compressed ≈ 3 data volume(at 2TB per volume)
with 1 data volume per PM node - 3 additional PMs
• Total 6 PMs, 2 UMs
Analytics with
MariaDB
ColumnStore
SQL Features
Aggregation
Windowing Functions
UDF
Tuning
Commands
ETL
MAX RANK
MIN DENSE_RANK
COUNT PERCENT_RANK
SUM NTH_VALUE
AVG FIRST_VALUE
VARIANCE LAST_VALUE
VAR_POP CUME_DIST
VAR_SAMP LAG
STD LEAD
STDDEV NTILE
STDDEV_POP PERCENTILE_CONT
STDDEV_SAMP PERCENTILE_DISC
ROW_NUMBER MEDIAN
• Aggregate over a series of related rows
• Simplified function for complex statistical
analytics over sliding window per row
- Cumulative, moving or centered aggregates
- Simple Statistical functions like rank, max, min,
average, median
- More complex functions such as distribution,
percentile, lag, lead
- Without running complex sub-queries
Windowing Functions
Source : InfiniDB SQL Syntax Guide
Top N Visitors for each Month
Window Function Example
Total for Each
Visitor by Month
Top 1 :
Time_rank = 1
Top 2 :
Time_rank <= 2
Top N :
Time_rank <= N
Data Modeling Best Practices
Star-schema optimizations are generally a good idea
Conservative data typing is very important
Especially around fixed-length vs. dictionary boundary (8 bytes)
IP Address vs. IP Number
Break down compound fields into individual fields:
Trivializes searching for sub-fields
Can avoid dictionary overhead
Cost to re-assemble is generally small
Horizontal
Partition:
8 Million Rows
Extent 2
Horizontal
Partition:
8 Million Rows
Extent 3
Horizontal
Partition:
8 Million Rows
Extent 1
Storage Architecture reduces I/O
• Only touch column files
that are in filter, projection,
group by, and join conditions
• Eliminate disk block touches
to partitions outside filter
and join conditions
Extent 1:
ShipDate: 2016-01-12 - 2016-03-05
Extent 2:
ShipDate: 2016-03-05 - 2016-09-23
Extent 3:
ShipDate: 2016-09-24 - 2017-01-06
SELECT Item, sum(Quantity) FROM Orders
WHERE ShipDate between ‘2016-01-01’ and ‘2016-01-31’
GROUP BY Item
Extent Elimination
Id OrderId Line Item Quantity Price Supplier ShipDate ShipMode
1 1 1 Laptop 5 1000 Dell 2016-01-12 G
2 1 2 Monitor 5 200 LG 2016-01-13 G
3 2 1 Mouse 1 20 Logitech 2016-02-05 M
4 3 1 Laptop 3 1600 Apple 2016-01-31 P
... ... ... ... ... ... ... ... ...
8M 2016-03-05
8M+1 2016-03-05
... ... ... ... ... ... ... ... ...
16M 2016-09-23
16M+1 2016-09-24
... ... ... ... ... ... ... ... ...
24M 2017-01-06
ELIMINATED PARTITION
ELIMINATED PARTITION
Tuning Commands
MariaDB [test]> select count(*) from t1 where i = 5;
+----------+
| count(*) |
+----------+
| 2200000 |
+----------+
1 row in set (0.27 sec)
MariaDB [test]> select calGetStats()G
*************************** 1. row ***************************
calGetStats(): Query Stats: MaxMemPct-0; NumTempFiles-0; TempFileSpace-0B;
ApproxPhyI/O-11042; CacheI/O-11042; BlocksTouched-11042; PartitionBlocksEliminated-0;
MsgBytesIn-332KB; MsgBytesOut-3KB; Mode-Distributed
1 row in set (0.00 sec)
calGetStats: Information On The Last Query Executed Within A Given Session
MariaDB [test]> select calSetTrace(1);
+----------------+
| calSetTrace(1) |
+----------------+
| 0 |
+----------------+
1 row in set (0.00 sec)
MariaDB [test]> select d.name dept_name,
-> count(*) emp_count,
-> sum(e.salary) salary_cost
-> from emp e
-> join i_dept d on e.dept_id = d.dept_id
-> group by dept_name;
+-------------+-----------+-------------+
| dept_name | emp_count | salary_cost |
+-------------+-----------+-------------+
| Engineering | 2 | 2500 |
| Sales | 2 | 3800 |
+-------------+-----------+-------------+
2 rows in set, 1 warning (0.03 sec)
Tuning Commands
calGetTrace: Detailed distributed query execution plan
MariaDB [test]> select calGetTrace()G
*************************** 1. row ***************************
calGetTrace():
Desc Mode Table TableOID ReferencedColumns PIO LIO PBE Elapsed Rows
CES UM - - - - - - 0.000 2
BPS PM e 3013 (dept_id,salary) 2 4 0 0.008 2
HJS PM e-d 3013 - - - - ----- -
TAS UM - - - - - - 0.000 2
1 row in set (0.00 sec)
Tuning Commands
Query Statistics
Users can view the query statistics by selecting the rows from the
query stats table in the infinidb_querystats schema.
Example 1 Example 2
List execution time, rows returned
for all the select queries within
the past 12 hours
select queryid, query, endtime-starttime,
rows from querystats where starttime >=
now() - interval 12 hour and querytype =
'SELECT';
List the average, min and max
running time of all the INSERT SELECT
queries within the past 12 hours
select min(endtime-starttime), max(endtime-starttime),
avg(endtimestarttime) from querystats where
querytype='INSERT SELECT' and starttime >=
now() - interval 12 hour;
calpont> getActiveSQLStatements
getactivesqlstatements Wed Oct 7 08:38:32 2015
Get List of Active SQL Statements
=================================
Start Time Time (hh:mm:ss) Session ID SQL Statement
---------------- ---------------- -------------------- -----------------------------------------------------------
Oct 7 08:38:30 00:00:03 73 select c_name,sum(lo_revenue) from customer, lineorder where
lo_custkey = c_custkey andc_custkey = 6 group by c_name
getActiveSQLStatements: List Active SQL Statements within the System
mysql> show processlist;
+----+------+-----------+-------+---------+------+-------+------------------+
| Id | User | Host | db | Command | Time | State | Info |
+----+------+-----------+-------+---------+------+-------+------------------+
| 73 | root | localhost | ssb10 | Query | 0 | NULL | show processlist |
+----+------+-----------+-------+---------+------+-------+------------------+
Tuning Commands
ETL
31
Data
Exportand
Data
Im
port
Bulk Data Load
cpimport, LOAD DATA INFILE
Bulk Data Export
mysql client, odbc, jdbc
Integration with MariaDB
ColumnStore cpimport and sql
interface
Bulk Data Load: cpimport
• Fastest way to load data into MariaDB ColumnStore
• Load data from CSV file
cpimport dbName tblName [loadFile]
• Load data from Standard Input
mysql -e 'select * from source_table;' -N db2 | cpimport destination_db
destination_tbl -s 't‘
• Load data from Binary Source file
cpimport -I1 mydb mytable sourcefile.bin
• Multiple tables in can be loaded in parallel by launching multiple jobs
• Read queries continue without being blocked
• Successful cpimport is auto-committed
• In case of errors, entire load is rolled back
Bulk Data Load: cpimport mode 1
Single file Central Input :
Data source at UM
cpimport -m1 mytest mytable
mytable.tbl
cpimport
Name Node
UM Node
Source
Data Node
PM Node
Data Node
PM Node
Data Node
PM Node
Bulk Data Load: cpimport mode 2
Distributed Input:
Data Source at PMs
Partitioned load
file on each PM
cpimport -m2 testdb mytable
/home/mydata/mytable.tbl
cpimport
Name Node
UM Node
Source
Data Node
PM Node
Data Node
PM Node
Data Node
PM Node
Source Source
Distributed Input:
Data Source at PMs
Partitioned load
file on each PM
cpimport -m2 testdb mytable
/home/mydata/mytable.tbl
Bulk load command
at one or more PM
cpimport –m3 testdb mytable
/home/mydata/mytable.tbl
Bulk Data Load: cpimport mode 3
Name Node
UM Node
Source
Data Node
PM Node
Data Node
PM Node
Data Node
PM Node
Source Source
cpimport cpimport cpimport
Traditional way of
importing data into
any MariaDB storage
engine table
Bulk Data Load:
LOAD DATA INFILE
Up to 2 times slower
than cpimport for
large size imports
mysql> load data infile '/tmp/
outfile1.txt' into table destinationTable;
Query OK, 9765625 rows affected
(2 min 20.01 sec)
Records: 9765625 Deleted:
0 Skipped: 0 Warnings: 0
Either success or
error operation can
be rolled back
• Connect with ODBC, JDBC or
mysql client to the UM
• Extract SQL query results in
output file on the UM
Bulk Data Export
Distributed Export Central Export
• Fastest way to do export
• Use LOCAL PM query feature
• Connect ODBC, JDBC or mysql
client to each PM
• Extract SQL query results in
output file on each PM
Resources
Data Warehousing
Selective column
based queries
Large number
of dimensions
High Performance
Analytics On Large
Volume Of Data
Reporting and analysis
on millions or billions
of rows
From datasets
containing millions
to trillions of rows
Terabytes to Petabytes
of datasets
Analytics Require
Complex Joins,
Windowing Functions
Technical Use Cases
Industry Category Use Case
Gaming Behavior Analytics Projecting and predicting user behavior based on past and current data
Advertising Customer Analytics Customer behavior data for market segmentation and predictive analytics.
Advertising Loyalty Analytics Customer analytics focusing on a person’s commitment to a product, company, or brand.
Web,
E-commerce
Click Stream Analytics
Web activity analysis, software testing, market research with analytics on data about the clicks areas of web pages while
web browsing [Deal News]
Marketing Promotional Testing Using marketing and campaign management data to identify the best criteria to be used for a particular marketing offer.
Social Network Network Analytics Relationship analytics among network nodes
Financial Fraud Analytics
Monitoring user financial transactions and identifying patterns of behaviour to predict and detect abnormal or fraudulent
activity to prevent damage to user and institution.
Healthcare Patient Analytics Analyzing patient medical records to identify patterns to be used for improved medical treatment.
Healthcare Clinical Analytics Analyzing clinical data and its impact on patients to identify patterns to be used for improved medical treatment.
Telco
Network and Application
Performance Analytics
Streaming data from network devices and applications enriched with business operations data to uncover actionable
insights for network planning, operations and marketing analytics
Aviation Flight analytics
Proactively project parts replacement, maintenance and air-plane retirement based on real-time and historically collected
flight parameter data [Boeing]
Customer Use Cases
Coming Soon - ColumnStore 1.1
● Text / Blob datatype support
● Bulk Write API Connector
○ Kafka integration
○ Replication integration
○ Custom
● User Defined Aggregate & Window functions.
● Data Redundancy for local storage.
● Installation improvements.
● Performance & stability improvements.
● MariaDB Server 10.2
Thank you
Andrew Hutchings (LinuxJedi)
Senior Software Engineer

More Related Content

What's hot

A short introduction to Vertica
A short introduction to VerticaA short introduction to Vertica
A short introduction to VerticaTommi Siivola
 
GC free coding in @Java presented @Geecon
GC free coding in @Java presented @GeeconGC free coding in @Java presented @Geecon
GC free coding in @Java presented @GeeconPeter Lawrey
 
AWS re:Invent 2016: Netflix: Using Amazon S3 as the fabric of our big data ec...
AWS re:Invent 2016: Netflix: Using Amazon S3 as the fabric of our big data ec...AWS re:Invent 2016: Netflix: Using Amazon S3 as the fabric of our big data ec...
AWS re:Invent 2016: Netflix: Using Amazon S3 as the fabric of our big data ec...Amazon Web Services
 
How can a data layer help my seo
How can a data layer help my seoHow can a data layer help my seo
How can a data layer help my seoPhil Pearce
 
Implementing Auditing in SQL Server
Implementing Auditing in SQL ServerImplementing Auditing in SQL Server
Implementing Auditing in SQL ServerDavid Dye
 
Need for Time series Database
Need for Time series DatabaseNeed for Time series Database
Need for Time series DatabasePramit Choudhary
 
InfluxDB IOx Tech Talks: The Impossible Dream: Easy-to-Use, Super Fast Softw...
InfluxDB IOx Tech Talks: The Impossible Dream:  Easy-to-Use, Super Fast Softw...InfluxDB IOx Tech Talks: The Impossible Dream:  Easy-to-Use, Super Fast Softw...
InfluxDB IOx Tech Talks: The Impossible Dream: Easy-to-Use, Super Fast Softw...InfluxData
 
Snowflake Architecture.pptx
Snowflake Architecture.pptxSnowflake Architecture.pptx
Snowflake Architecture.pptxchennakesava44
 
CDC patterns in Apache Kafka®
CDC patterns in Apache Kafka®CDC patterns in Apache Kafka®
CDC patterns in Apache Kafka®confluent
 
Building Reliable Data Lakes at Scale with Delta Lake
Building Reliable Data Lakes at Scale with Delta LakeBuilding Reliable Data Lakes at Scale with Delta Lake
Building Reliable Data Lakes at Scale with Delta LakeDatabricks
 
Threat Detection and Response at Scale with Dominique Brezinski
Threat Detection and Response at Scale with Dominique BrezinskiThreat Detection and Response at Scale with Dominique Brezinski
Threat Detection and Response at Scale with Dominique BrezinskiDatabricks
 
Hudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilitiesHudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilitiesNishith Agarwal
 
Parallel/Distributed Deep Learning and CDSW
Parallel/Distributed Deep Learning and CDSWParallel/Distributed Deep Learning and CDSW
Parallel/Distributed Deep Learning and CDSWDataWorks Summit
 
How netflix manages petabyte scale apache cassandra in the cloud
How netflix manages petabyte scale apache cassandra in the cloudHow netflix manages petabyte scale apache cassandra in the cloud
How netflix manages petabyte scale apache cassandra in the cloudVinay Kumar Chella
 
微博cache设计谈
微博cache设计谈微博cache设计谈
微博cache设计谈Tim Y
 

What's hot (20)

Apache Solr
Apache SolrApache Solr
Apache Solr
 
A short introduction to Vertica
A short introduction to VerticaA short introduction to Vertica
A short introduction to Vertica
 
GC free coding in @Java presented @Geecon
GC free coding in @Java presented @GeeconGC free coding in @Java presented @Geecon
GC free coding in @Java presented @Geecon
 
AWS re:Invent 2016: Netflix: Using Amazon S3 as the fabric of our big data ec...
AWS re:Invent 2016: Netflix: Using Amazon S3 as the fabric of our big data ec...AWS re:Invent 2016: Netflix: Using Amazon S3 as the fabric of our big data ec...
AWS re:Invent 2016: Netflix: Using Amazon S3 as the fabric of our big data ec...
 
How can a data layer help my seo
How can a data layer help my seoHow can a data layer help my seo
How can a data layer help my seo
 
Introduction to Amazon Redshift
Introduction to Amazon RedshiftIntroduction to Amazon Redshift
Introduction to Amazon Redshift
 
Implementing Auditing in SQL Server
Implementing Auditing in SQL ServerImplementing Auditing in SQL Server
Implementing Auditing in SQL Server
 
Need for Time series Database
Need for Time series DatabaseNeed for Time series Database
Need for Time series Database
 
Vertica-Database
Vertica-DatabaseVertica-Database
Vertica-Database
 
InfluxDB IOx Tech Talks: The Impossible Dream: Easy-to-Use, Super Fast Softw...
InfluxDB IOx Tech Talks: The Impossible Dream:  Easy-to-Use, Super Fast Softw...InfluxDB IOx Tech Talks: The Impossible Dream:  Easy-to-Use, Super Fast Softw...
InfluxDB IOx Tech Talks: The Impossible Dream: Easy-to-Use, Super Fast Softw...
 
Snowflake Architecture.pptx
Snowflake Architecture.pptxSnowflake Architecture.pptx
Snowflake Architecture.pptx
 
CDC patterns in Apache Kafka®
CDC patterns in Apache Kafka®CDC patterns in Apache Kafka®
CDC patterns in Apache Kafka®
 
Building Reliable Data Lakes at Scale with Delta Lake
Building Reliable Data Lakes at Scale with Delta LakeBuilding Reliable Data Lakes at Scale with Delta Lake
Building Reliable Data Lakes at Scale with Delta Lake
 
Threat Detection and Response at Scale with Dominique Brezinski
Threat Detection and Response at Scale with Dominique BrezinskiThreat Detection and Response at Scale with Dominique Brezinski
Threat Detection and Response at Scale with Dominique Brezinski
 
Hudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilitiesHudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilities
 
Graph Databases at Netflix
Graph Databases at NetflixGraph Databases at Netflix
Graph Databases at Netflix
 
Parallel/Distributed Deep Learning and CDSW
Parallel/Distributed Deep Learning and CDSWParallel/Distributed Deep Learning and CDSW
Parallel/Distributed Deep Learning and CDSW
 
How netflix manages petabyte scale apache cassandra in the cloud
How netflix manages petabyte scale apache cassandra in the cloudHow netflix manages petabyte scale apache cassandra in the cloud
How netflix manages petabyte scale apache cassandra in the cloud
 
微博cache设计谈
微博cache设计谈微博cache设计谈
微博cache设计谈
 
Introduction to Azure Data Lake
Introduction to Azure Data LakeIntroduction to Azure Data Lake
Introduction to Azure Data Lake
 

Similar to Big Data Analytics with MariaDB ColumnStore

MariaDB ColumnStore
MariaDB ColumnStoreMariaDB ColumnStore
MariaDB ColumnStoreMariaDB plc
 
04 2017 emea_roadshowmilan_mariadb columnstore
04 2017 emea_roadshowmilan_mariadb columnstore04 2017 emea_roadshowmilan_mariadb columnstore
04 2017 emea_roadshowmilan_mariadb columnstoremlraviol
 
Big Data Analytics with MariaDB ColumnStore
Big Data Analytics with MariaDB ColumnStoreBig Data Analytics with MariaDB ColumnStore
Big Data Analytics with MariaDB ColumnStoreMariaDB plc
 
MariaDB ColumnStore
MariaDB ColumnStoreMariaDB ColumnStore
MariaDB ColumnStoreMariaDB plc
 
Big Data Analytics with MariaDB ColumnStore
Big Data Analytics with MariaDB ColumnStoreBig Data Analytics with MariaDB ColumnStore
Big Data Analytics with MariaDB ColumnStoreMariaDB plc
 
Big Data Analytics with MariaDB ColumnStore
Big Data Analytics with MariaDB ColumnStoreBig Data Analytics with MariaDB ColumnStore
Big Data Analytics with MariaDB ColumnStoreMariaDB plc
 
Big Data LDN 2017: Big Data Analytics with MariaDB ColumnStore
Big Data LDN 2017: Big Data Analytics with MariaDB ColumnStoreBig Data LDN 2017: Big Data Analytics with MariaDB ColumnStore
Big Data LDN 2017: Big Data Analytics with MariaDB ColumnStoreMatt Stubbs
 
5_MariaDB_What's New in MariaDB Server 10.2 and Big Data Analytics with Maria...
5_MariaDB_What's New in MariaDB Server 10.2 and Big Data Analytics with Maria...5_MariaDB_What's New in MariaDB Server 10.2 and Big Data Analytics with Maria...
5_MariaDB_What's New in MariaDB Server 10.2 and Big Data Analytics with Maria...Kangaroot
 
[db tech showcase OSS 2017] A25: Replacing Oracle Database at DBS Bank by Mar...
[db tech showcase OSS 2017] A25: Replacing Oracle Database at DBS Bank by Mar...[db tech showcase OSS 2017] A25: Replacing Oracle Database at DBS Bank by Mar...
[db tech showcase OSS 2017] A25: Replacing Oracle Database at DBS Bank by Mar...Insight Technology, Inc.
 
[db tech showcase OSS 2017] A23: Analytics with MariaDB ColumnStore by MariaD...
[db tech showcase OSS 2017] A23: Analytics with MariaDB ColumnStore by MariaD...[db tech showcase OSS 2017] A23: Analytics with MariaDB ColumnStore by MariaD...
[db tech showcase OSS 2017] A23: Analytics with MariaDB ColumnStore by MariaD...Insight Technology, Inc.
 
Introduction of MariaDB AX / TX
Introduction of MariaDB AX / TXIntroduction of MariaDB AX / TX
Introduction of MariaDB AX / TXGOTO Satoru
 
Jss 2015 in memory and operational analytics
Jss 2015   in memory and operational analyticsJss 2015   in memory and operational analytics
Jss 2015 in memory and operational analyticsDavid Barbarin
 
[JSS2015] In memory and operational analytics
[JSS2015] In memory and operational analytics[JSS2015] In memory and operational analytics
[JSS2015] In memory and operational analyticsGUSS
 
03 2017Emea_RoadshowMilan-WhatsNew-Mariadbserver10_2andmaxscale 2_1
03 2017Emea_RoadshowMilan-WhatsNew-Mariadbserver10_2andmaxscale 2_103 2017Emea_RoadshowMilan-WhatsNew-Mariadbserver10_2andmaxscale 2_1
03 2017Emea_RoadshowMilan-WhatsNew-Mariadbserver10_2andmaxscale 2_1mlraviol
 
What’s New in MariaDB Server 10.2
What’s New in MariaDB Server 10.2What’s New in MariaDB Server 10.2
What’s New in MariaDB Server 10.2MariaDB plc
 
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1MariaDB plc
 
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1MariaDB plc
 
Accelerating analytics on the Sensor and IoT Data.
Accelerating analytics on the Sensor and IoT Data. Accelerating analytics on the Sensor and IoT Data.
Accelerating analytics on the Sensor and IoT Data. Keshav Murthy
 
Top 10 tips for Oracle performance
Top 10 tips for Oracle performanceTop 10 tips for Oracle performance
Top 10 tips for Oracle performanceGuy Harrison
 
Understand the Query Plan to Optimize Performance with EXPLAIN and EXPLAIN AN...
Understand the Query Plan to Optimize Performance with EXPLAIN and EXPLAIN AN...Understand the Query Plan to Optimize Performance with EXPLAIN and EXPLAIN AN...
Understand the Query Plan to Optimize Performance with EXPLAIN and EXPLAIN AN...EDB
 

Similar to Big Data Analytics with MariaDB ColumnStore (20)

MariaDB ColumnStore
MariaDB ColumnStoreMariaDB ColumnStore
MariaDB ColumnStore
 
04 2017 emea_roadshowmilan_mariadb columnstore
04 2017 emea_roadshowmilan_mariadb columnstore04 2017 emea_roadshowmilan_mariadb columnstore
04 2017 emea_roadshowmilan_mariadb columnstore
 
Big Data Analytics with MariaDB ColumnStore
Big Data Analytics with MariaDB ColumnStoreBig Data Analytics with MariaDB ColumnStore
Big Data Analytics with MariaDB ColumnStore
 
MariaDB ColumnStore
MariaDB ColumnStoreMariaDB ColumnStore
MariaDB ColumnStore
 
Big Data Analytics with MariaDB ColumnStore
Big Data Analytics with MariaDB ColumnStoreBig Data Analytics with MariaDB ColumnStore
Big Data Analytics with MariaDB ColumnStore
 
Big Data Analytics with MariaDB ColumnStore
Big Data Analytics with MariaDB ColumnStoreBig Data Analytics with MariaDB ColumnStore
Big Data Analytics with MariaDB ColumnStore
 
Big Data LDN 2017: Big Data Analytics with MariaDB ColumnStore
Big Data LDN 2017: Big Data Analytics with MariaDB ColumnStoreBig Data LDN 2017: Big Data Analytics with MariaDB ColumnStore
Big Data LDN 2017: Big Data Analytics with MariaDB ColumnStore
 
5_MariaDB_What's New in MariaDB Server 10.2 and Big Data Analytics with Maria...
5_MariaDB_What's New in MariaDB Server 10.2 and Big Data Analytics with Maria...5_MariaDB_What's New in MariaDB Server 10.2 and Big Data Analytics with Maria...
5_MariaDB_What's New in MariaDB Server 10.2 and Big Data Analytics with Maria...
 
[db tech showcase OSS 2017] A25: Replacing Oracle Database at DBS Bank by Mar...
[db tech showcase OSS 2017] A25: Replacing Oracle Database at DBS Bank by Mar...[db tech showcase OSS 2017] A25: Replacing Oracle Database at DBS Bank by Mar...
[db tech showcase OSS 2017] A25: Replacing Oracle Database at DBS Bank by Mar...
 
[db tech showcase OSS 2017] A23: Analytics with MariaDB ColumnStore by MariaD...
[db tech showcase OSS 2017] A23: Analytics with MariaDB ColumnStore by MariaD...[db tech showcase OSS 2017] A23: Analytics with MariaDB ColumnStore by MariaD...
[db tech showcase OSS 2017] A23: Analytics with MariaDB ColumnStore by MariaD...
 
Introduction of MariaDB AX / TX
Introduction of MariaDB AX / TXIntroduction of MariaDB AX / TX
Introduction of MariaDB AX / TX
 
Jss 2015 in memory and operational analytics
Jss 2015   in memory and operational analyticsJss 2015   in memory and operational analytics
Jss 2015 in memory and operational analytics
 
[JSS2015] In memory and operational analytics
[JSS2015] In memory and operational analytics[JSS2015] In memory and operational analytics
[JSS2015] In memory and operational analytics
 
03 2017Emea_RoadshowMilan-WhatsNew-Mariadbserver10_2andmaxscale 2_1
03 2017Emea_RoadshowMilan-WhatsNew-Mariadbserver10_2andmaxscale 2_103 2017Emea_RoadshowMilan-WhatsNew-Mariadbserver10_2andmaxscale 2_1
03 2017Emea_RoadshowMilan-WhatsNew-Mariadbserver10_2andmaxscale 2_1
 
What’s New in MariaDB Server 10.2
What’s New in MariaDB Server 10.2What’s New in MariaDB Server 10.2
What’s New in MariaDB Server 10.2
 
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1
 
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1
What's New in MariaDB Server 10.2 and MariaDB MaxScale 2.1
 
Accelerating analytics on the Sensor and IoT Data.
Accelerating analytics on the Sensor and IoT Data. Accelerating analytics on the Sensor and IoT Data.
Accelerating analytics on the Sensor and IoT Data.
 
Top 10 tips for Oracle performance
Top 10 tips for Oracle performanceTop 10 tips for Oracle performance
Top 10 tips for Oracle performance
 
Understand the Query Plan to Optimize Performance with EXPLAIN and EXPLAIN AN...
Understand the Query Plan to Optimize Performance with EXPLAIN and EXPLAIN AN...Understand the Query Plan to Optimize Performance with EXPLAIN and EXPLAIN AN...
Understand the Query Plan to Optimize Performance with EXPLAIN and EXPLAIN AN...
 

More from MariaDB plc

MariaDB Paris Workshop 2023 - MaxScale 23.02.x
MariaDB Paris Workshop 2023 - MaxScale 23.02.xMariaDB Paris Workshop 2023 - MaxScale 23.02.x
MariaDB Paris Workshop 2023 - MaxScale 23.02.xMariaDB plc
 
MariaDB Paris Workshop 2023 - Newpharma
MariaDB Paris Workshop 2023 - NewpharmaMariaDB Paris Workshop 2023 - Newpharma
MariaDB Paris Workshop 2023 - NewpharmaMariaDB plc
 
MariaDB Paris Workshop 2023 - Cloud
MariaDB Paris Workshop 2023 - CloudMariaDB Paris Workshop 2023 - Cloud
MariaDB Paris Workshop 2023 - CloudMariaDB plc
 
MariaDB Paris Workshop 2023 - MariaDB Enterprise
MariaDB Paris Workshop 2023 - MariaDB EnterpriseMariaDB Paris Workshop 2023 - MariaDB Enterprise
MariaDB Paris Workshop 2023 - MariaDB EnterpriseMariaDB plc
 
MariaDB Paris Workshop 2023 - Performance Optimization
MariaDB Paris Workshop 2023 - Performance OptimizationMariaDB Paris Workshop 2023 - Performance Optimization
MariaDB Paris Workshop 2023 - Performance OptimizationMariaDB plc
 
MariaDB Paris Workshop 2023 - MaxScale
MariaDB Paris Workshop 2023 - MaxScale MariaDB Paris Workshop 2023 - MaxScale
MariaDB Paris Workshop 2023 - MaxScale MariaDB plc
 
MariaDB Paris Workshop 2023 - novadys presentation
MariaDB Paris Workshop 2023 - novadys presentationMariaDB Paris Workshop 2023 - novadys presentation
MariaDB Paris Workshop 2023 - novadys presentationMariaDB plc
 
MariaDB Paris Workshop 2023 - DARVA presentation
MariaDB Paris Workshop 2023 - DARVA presentationMariaDB Paris Workshop 2023 - DARVA presentation
MariaDB Paris Workshop 2023 - DARVA presentationMariaDB plc
 
MariaDB Tech und Business Update Hamburg 2023 - MariaDB Enterprise Server
MariaDB Tech und Business Update Hamburg 2023 - MariaDB Enterprise Server MariaDB Tech und Business Update Hamburg 2023 - MariaDB Enterprise Server
MariaDB Tech und Business Update Hamburg 2023 - MariaDB Enterprise Server MariaDB plc
 
MariaDB SkySQL Autonome Skalierung, Observability, Cloud-Backup
MariaDB SkySQL Autonome Skalierung, Observability, Cloud-BackupMariaDB SkySQL Autonome Skalierung, Observability, Cloud-Backup
MariaDB SkySQL Autonome Skalierung, Observability, Cloud-BackupMariaDB plc
 
Einführung : MariaDB Tech und Business Update Hamburg 2023
Einführung : MariaDB Tech und Business Update Hamburg 2023Einführung : MariaDB Tech und Business Update Hamburg 2023
Einführung : MariaDB Tech und Business Update Hamburg 2023MariaDB plc
 
Hochverfügbarkeitslösungen mit MariaDB
Hochverfügbarkeitslösungen mit MariaDBHochverfügbarkeitslösungen mit MariaDB
Hochverfügbarkeitslösungen mit MariaDBMariaDB plc
 
Die Neuheiten in MariaDB Enterprise Server
Die Neuheiten in MariaDB Enterprise ServerDie Neuheiten in MariaDB Enterprise Server
Die Neuheiten in MariaDB Enterprise ServerMariaDB plc
 
Global Data Replication with Galera for Ansell Guardian®
Global Data Replication with Galera for Ansell Guardian®Global Data Replication with Galera for Ansell Guardian®
Global Data Replication with Galera for Ansell Guardian®MariaDB plc
 
Introducing workload analysis
Introducing workload analysisIntroducing workload analysis
Introducing workload analysisMariaDB plc
 
Under the hood: SkySQL monitoring
Under the hood: SkySQL monitoringUnder the hood: SkySQL monitoring
Under the hood: SkySQL monitoringMariaDB plc
 
Introducing the R2DBC async Java connector
Introducing the R2DBC async Java connectorIntroducing the R2DBC async Java connector
Introducing the R2DBC async Java connectorMariaDB plc
 
MariaDB Enterprise Tools introduction
MariaDB Enterprise Tools introductionMariaDB Enterprise Tools introduction
MariaDB Enterprise Tools introductionMariaDB plc
 
Faster, better, stronger: The new InnoDB
Faster, better, stronger: The new InnoDBFaster, better, stronger: The new InnoDB
Faster, better, stronger: The new InnoDBMariaDB plc
 
The architecture of SkySQL
The architecture of SkySQLThe architecture of SkySQL
The architecture of SkySQLMariaDB plc
 

More from MariaDB plc (20)

MariaDB Paris Workshop 2023 - MaxScale 23.02.x
MariaDB Paris Workshop 2023 - MaxScale 23.02.xMariaDB Paris Workshop 2023 - MaxScale 23.02.x
MariaDB Paris Workshop 2023 - MaxScale 23.02.x
 
MariaDB Paris Workshop 2023 - Newpharma
MariaDB Paris Workshop 2023 - NewpharmaMariaDB Paris Workshop 2023 - Newpharma
MariaDB Paris Workshop 2023 - Newpharma
 
MariaDB Paris Workshop 2023 - Cloud
MariaDB Paris Workshop 2023 - CloudMariaDB Paris Workshop 2023 - Cloud
MariaDB Paris Workshop 2023 - Cloud
 
MariaDB Paris Workshop 2023 - MariaDB Enterprise
MariaDB Paris Workshop 2023 - MariaDB EnterpriseMariaDB Paris Workshop 2023 - MariaDB Enterprise
MariaDB Paris Workshop 2023 - MariaDB Enterprise
 
MariaDB Paris Workshop 2023 - Performance Optimization
MariaDB Paris Workshop 2023 - Performance OptimizationMariaDB Paris Workshop 2023 - Performance Optimization
MariaDB Paris Workshop 2023 - Performance Optimization
 
MariaDB Paris Workshop 2023 - MaxScale
MariaDB Paris Workshop 2023 - MaxScale MariaDB Paris Workshop 2023 - MaxScale
MariaDB Paris Workshop 2023 - MaxScale
 
MariaDB Paris Workshop 2023 - novadys presentation
MariaDB Paris Workshop 2023 - novadys presentationMariaDB Paris Workshop 2023 - novadys presentation
MariaDB Paris Workshop 2023 - novadys presentation
 
MariaDB Paris Workshop 2023 - DARVA presentation
MariaDB Paris Workshop 2023 - DARVA presentationMariaDB Paris Workshop 2023 - DARVA presentation
MariaDB Paris Workshop 2023 - DARVA presentation
 
MariaDB Tech und Business Update Hamburg 2023 - MariaDB Enterprise Server
MariaDB Tech und Business Update Hamburg 2023 - MariaDB Enterprise Server MariaDB Tech und Business Update Hamburg 2023 - MariaDB Enterprise Server
MariaDB Tech und Business Update Hamburg 2023 - MariaDB Enterprise Server
 
MariaDB SkySQL Autonome Skalierung, Observability, Cloud-Backup
MariaDB SkySQL Autonome Skalierung, Observability, Cloud-BackupMariaDB SkySQL Autonome Skalierung, Observability, Cloud-Backup
MariaDB SkySQL Autonome Skalierung, Observability, Cloud-Backup
 
Einführung : MariaDB Tech und Business Update Hamburg 2023
Einführung : MariaDB Tech und Business Update Hamburg 2023Einführung : MariaDB Tech und Business Update Hamburg 2023
Einführung : MariaDB Tech und Business Update Hamburg 2023
 
Hochverfügbarkeitslösungen mit MariaDB
Hochverfügbarkeitslösungen mit MariaDBHochverfügbarkeitslösungen mit MariaDB
Hochverfügbarkeitslösungen mit MariaDB
 
Die Neuheiten in MariaDB Enterprise Server
Die Neuheiten in MariaDB Enterprise ServerDie Neuheiten in MariaDB Enterprise Server
Die Neuheiten in MariaDB Enterprise Server
 
Global Data Replication with Galera for Ansell Guardian®
Global Data Replication with Galera for Ansell Guardian®Global Data Replication with Galera for Ansell Guardian®
Global Data Replication with Galera for Ansell Guardian®
 
Introducing workload analysis
Introducing workload analysisIntroducing workload analysis
Introducing workload analysis
 
Under the hood: SkySQL monitoring
Under the hood: SkySQL monitoringUnder the hood: SkySQL monitoring
Under the hood: SkySQL monitoring
 
Introducing the R2DBC async Java connector
Introducing the R2DBC async Java connectorIntroducing the R2DBC async Java connector
Introducing the R2DBC async Java connector
 
MariaDB Enterprise Tools introduction
MariaDB Enterprise Tools introductionMariaDB Enterprise Tools introduction
MariaDB Enterprise Tools introduction
 
Faster, better, stronger: The new InnoDB
Faster, better, stronger: The new InnoDBFaster, better, stronger: The new InnoDB
Faster, better, stronger: The new InnoDB
 
The architecture of SkySQL
The architecture of SkySQLThe architecture of SkySQL
The architecture of SkySQL
 

Recently uploaded

Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesŁukasz Chruściel
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based projectAnoyGreter
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsAhmed Mohamed
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptkotipi9215
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)jennyeacort
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - InfographicHr365.us smith
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 

Recently uploaded (20)

Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New Features
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based project
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML Diagrams
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.ppt
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - Infographic
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 

Big Data Analytics with MariaDB ColumnStore

  • 3. MariaDB ColumnStore • GPLv2 Open Source • Columnar, Massively Parallel MariaDB Storage Engine • Scalable, high-performance analytics platform • Built in redundancy and high availability • Runs on premise, on AWS cloud • Full SQL syntax and capabilities regardless of platform Big Data Sources Analytics Insight MariaDB ColumnStore . . . Node 1 Node 2 Node 3 Node N Local / AWS® / GlusterFS ® ELT Tools BI Tools
  • 4. MariaDB ColumnStore High performance columnar storage engine that support wide variety of analytical use cases with SQL in a highly scalable distributed environments Parallel query processing for distributed environments Faster, More Efficient Queries Single SQL Interface for OLTP and analytics Easier Enterprise Analytics Power of SQL and Freedom of Open Source to Big Data Analytics Better Price Performance
  • 5. OLTP/NoSQL Workloads Suited for reporting or analysis of millions-billions of rows from data sets containing millions-trillions of rows. OLAP/Analytic/ Reporting Workloads Workload – Query Vision/Scope 1 100 10,000 10-100GB 10,000,000,000 1-10TB 1,000,000 100,000,000 100-1,000GB
  • 6. Row-oriented vs. Column-oriented format • Row oriented – Rows stored sequentially in a file – Scans through every record row by row • Column oriented: – Each column is stored in a separate file – Scans only the relevant columns ID Fname Lname State Zip Phone Age Sex 1 Bugs Bunny NY 11217 (718) 938-3235 34 M 2 Yosemite Sam CA 95389 (209) 375-6572 52 M 3 Daffy Duck NY 10013 (212) 227-1810 35 M 4 Elmer Fudd ME 04578 (207) 882-7323 43 M 5 Witch Hazel MA 01970 (978) 744-0991 57 F ID 1 2 3 4 5 Fname Bugs Yosemite Daffy Elmer Witch Lname Bunny Sam Duck Fudd Hazel State NY CA NY ME MA Zip 11217 95389 10013 04578 01970 Phone (718) 938-3235 (209) 375-6572 (212) 227-1810 (207) 882-7323 (978) 744-0991 Age 34 52 35 43 57 Sex M M M M F SELECT Fname FROM People WHERE State = 'NY'
  • 7. Analytics • In-database distributed analytics with complex join, aggregation, window functions • Cross Engine Join allows for queries to be executed referencing both columnstore and non-columnstore tables. • Extensible User Defined Functions allow creation of specialized logic executed at PM level. • Standard MariaDB Connectors provide for out of the box integration with: – BI Tools (Tableau, Pentaho, ..) – Custom Application Code (Java, Scala, C#, Python, ..) – Data Processing Frameworks (R, Spark, Numpy, ..) Item ID Server_date Revenue 1 2017-02-01 20,000.0 1 2017-02-02 5,001.00 2 2017-02-01 15,000.0 2 2017-02-04 34,029.0 2 2017-02-05 7,138.00 3 2017-02-01 17,250.0 3 2017-02-03 25,010.0 3 2017-02-04 21,034.0 3 2017-02-05 4,120.00 Running Average 20,000.00 12,500.50 15,000.00 34,029.00 20,583.50 17,250.00 25,010.00 23,022.00 12,577.00 Window Function Example: Daily Running Average Revenue by Item SELECT item_id, server_date, daily_revenue, AVG(revenue) OVER (PARTITION BY item_id ORDER BY server_date RANGE INTERVAL 1 DAY PRECEDING ) running_avg FROM web_item_sales BI Tool Custom Big Data App Data Processing Framework JDBC / ODBC / Connector
  • 8. Enterprise Grade • Enterprise Grade Security – SSL, role based access, auditability. – MaxScale database firewall • Deployment Flexibility – Run on commodity Linux servers on premise or in the cloud. – AWS optimized AMI Image. – Add horizontal capacity as you grow. • High Availability – Automatic UM failover – Automatic PM failover with distributed data attachment across all PMs in SAN and EBS environment Shared-Nothing Distributed Data Storage Compressed by default User Module (UM) Performance Module (PM) Data Storage Load Balancer - MaxScale
  • 9. MariaDB ColumnStore Architecture Columnar Distributed Data Storage Local Storage | SAN | NAS | EBS | Gluster FS BI Tool SQL Client Custom Big Data App Application MariaDB SQL Front End Distributed Query Engine Data Storage
  • 10. MariaDB ColumnStore Shared Nothing Distributed Data Storage SQL Column Primitives User Module Performance Module UM PM • Query received and parsed by MariaDB Front End on UM • Storage Engine Plugin breaks down query in primitive operations and distributes across PM • Primitives processed on PM • One thread working on a range of rows • Execute column restrictions and projections • Execute group by/aggregation against local data • Each PM work on Primitives in parallel and fully distributed • Each primitive executes in a fraction of a second • Return intermediate results to UM Primitives ↓↓↓↓ Intermediate ↑↑Results↑↑
  • 11. MariaDB ColumnStore MariaDB ColumnStore uses standard “Engine=columnstore” syntax mysql> use tpcds_djoshi Database changed mysql> select count(*) from store_sales; +----------+ | count(*) | +----------+ | 2880404 | +----------+ 1 row in set (1.68 sec) mysql> describe warehouse; +-------------------+--------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-------------------+--------------+------+-----+---------+-------+ | w_warehouse_sk | int(11) | NO | | NULL | | | w_warehouse_id | char(16) | NO | | NULL | | | w_warehouse_name | varchar(20) | YES | | NULL | | | w_warehouse_sq_ft | int(11) | YES | | NULL | | | w_street_number | char(10) | YES | | NULL | | | w_street_name | varchar(60) | YES | | NULL | | | w_street_type | char(15) | YES | | NULL | | | w_suite_number | char(10) | YES | | NULL | | | w_city | varchar(60) | YES | | NULL | | | w_county | varchar(30) | YES | | NULL | | | w_state | char(2) | YES | | NULL | | | w_zip | char(10) | YES | | NULL | | | w_country | varchar(20) | YES | | NULL | | | w_gmt_offset | decimal(5,2) | YES | | NULL | | +-------------------+--------------+------+-----+---------+-------+ 14 rows in set (0.05 sec) CREATE TABLE `game_warehouse`.`dim_title` ( `id` INT, `name` VARCHAR(45), `publisher` VARCHAR(45), `release_date` DATE, `language` INT, `platform_name` VARCHAR(45), `version` VARCHAR(45) ) ENGINE=columnstore; Uses custom scalable columnar architecture
  • 12. MariaDB ColumnStore mysql> use tpcds_djoshi Database changed mysql> select count(*) from store_sales; +----------+ | count(*) | +----------+ | 2880404 | +----------+ 1 row in set (1.68 sec) mysql> describe warehouse; +-------------------+--------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-------------------+--------------+------+-----+---------+-------+ | w_warehouse_sk | int(11) | NO | | NULL | | | w_warehouse_id | char(16) | NO | | NULL | | | w_warehouse_name | varchar(20) | YES | | NULL | | | w_warehouse_sq_ft | int(11) | YES | | NULL | | | w_street_number | char(10) | YES | | NULL | | | w_street_name | varchar(60) | YES | | NULL | | | w_street_type | char(15) | YES | | NULL | | | w_suite_number | char(10) | YES | | NULL | | | w_city | varchar(60) | YES | | NULL | | | w_county | varchar(30) | YES | | NULL | | | w_state | char(2) | YES | | NULL | | | w_zip | char(10) | YES | | NULL | | | w_country | varchar(20) | YES | | NULL | | | w_gmt_offset | decimal(5,2) | YES | | NULL | | +-------------------+--------------+------+-----+---------+-------+ 14 rows in set (0.05 sec) MariaDB Front End Standard ANSI SQL
  • 13. Process Functionality Value MariaDB • Hosts MariaDB • Connection management • SQL parsing & optimization ✓ Familiar DBMS interface ✓ Leverages existing partner integrations ✓ Delivers rich SQL syntax support Extent Map • Abstracts physical and logical storage • Metadata store ✓ Enables partition elimination ExeMgr • Work distribution • Final results management and aggregation ✓ Multi-threaded to take advantage of multi-core HW platforms User Module at a Glance
  • 14. Process Functionality Value PrimProc • Scale-out cache management • Distributed scan, filter, join and aggregation operations • Resource management ✓ Independent scalability and tunable performance ✓ Multi-threaded to take advantage of multi-core HW platforms Data • High Speed Bulk Load • Transactional DML and DDL • Online schema extensions ✓ Non-blocking read enabled ✓ Multi-threaded to take advantage of multi-core HW platforms Performance Module at a Glance
  • 15. Compression with Data Storage Layer Blocks (8KB) Extent1 (8MB~64MB 8 million rows) Logical Layer Segment File1 (maps to an Extent) Physical Layer Compression Chunks
  • 16. Key meta-structure that powers MariaDB ColumnStore’s performance A catalog of all extents • Minimum and maximum values for a column’s data within an extent • Corresponding blocks for each extent Master copy of the Extent Map on primary PM node Upon system startup, copied to all other UM and PM nodes for disaster recovery and failover purposes Extent Map resident in memory for quick access at all nodes As extents modified, updates broadcasted to all participating nodes Stores about 64 bytes for each 8-64 Mbytes on disk Extent Map
  • 17. Extent Map When performing queries: • Eliminate the extents by taking into consideration only the extents for the column in join and filter conditions • Use the minimum and maximum value for the extents for join columns to filter the columns and eliminate extent Multiple columns can be used together for partition elimination Transitive properties apply, i.e. a filter on a dimension column (date, for example) can allow for partition elimination on fact table
  • 18. • 8-byte fixed length token (pointer). • A variable length value stored at the location identified by the pointer. Data Types 1-byte Field with 8192 values per 8k block 2-byte Field with 4096 values per 8k block 4-byte Field with 2048 values per 8k block 8-byte Field with 1024 values per 8k block Dictionary structure made up of 2 files/extents with: At the physical layer, all columns are stored as:
  • 19. • Varchar(8) or larger • Char(9) or larger Data Types 1-byte Field Examples TinyInt, Char(1) 2-byte Field Examples SmallInt, Char(2) 4-byte Field Examples Int, Char(3), Char(4), date, float 8-byte Field Examples BigInt, Char(5-8),datetime, real/double Dictionary Examples At the physical layer, all columns are stored as:
  • 20. Sizing Minimum Spec UM 4 core, 32 G RAM PM 4 core, 16 G RAM Typical Server spec PM 8 core 64G RAM UM 8 core, 64G RAM Data Storage External Data Volumes • Maximum 2 data volume per IO channel per PM node server • up to 2TB on the disk per data volume ≈ Max 4 TB per PM node Local disk Up to 2TB on the disk per PM node server DETAILED SIZING GUIDE based on data size and workload
  • 21. Sizing - Example • MariaDB ColumnStore 60TB uncompressed data = 6TB compressed data at 10x compression • 2UM - 8 core 512G(based on work load) • 6 TB compressed = 3 data volume (at 2TB per volume) - with 1 data volume per PM node - 3PMs • Data growth - 2TB per month, Data retention - 2 years - Plan for 2TB X24 = 48 TB additional - 48 TB = 4.8TB compressed ≈ 3 data volume(at 2TB per volume) with 1 data volume per PM node - 3 additional PMs • Total 6 PMs, 2 UMs
  • 23. MAX RANK MIN DENSE_RANK COUNT PERCENT_RANK SUM NTH_VALUE AVG FIRST_VALUE VARIANCE LAST_VALUE VAR_POP CUME_DIST VAR_SAMP LAG STD LEAD STDDEV NTILE STDDEV_POP PERCENTILE_CONT STDDEV_SAMP PERCENTILE_DISC ROW_NUMBER MEDIAN • Aggregate over a series of related rows • Simplified function for complex statistical analytics over sliding window per row - Cumulative, moving or centered aggregates - Simple Statistical functions like rank, max, min, average, median - More complex functions such as distribution, percentile, lag, lead - Without running complex sub-queries Windowing Functions Source : InfiniDB SQL Syntax Guide
  • 24. Top N Visitors for each Month Window Function Example Total for Each Visitor by Month Top 1 : Time_rank = 1 Top 2 : Time_rank <= 2 Top N : Time_rank <= N
  • 25. Data Modeling Best Practices Star-schema optimizations are generally a good idea Conservative data typing is very important Especially around fixed-length vs. dictionary boundary (8 bytes) IP Address vs. IP Number Break down compound fields into individual fields: Trivializes searching for sub-fields Can avoid dictionary overhead Cost to re-assemble is generally small
  • 26. Horizontal Partition: 8 Million Rows Extent 2 Horizontal Partition: 8 Million Rows Extent 3 Horizontal Partition: 8 Million Rows Extent 1 Storage Architecture reduces I/O • Only touch column files that are in filter, projection, group by, and join conditions • Eliminate disk block touches to partitions outside filter and join conditions Extent 1: ShipDate: 2016-01-12 - 2016-03-05 Extent 2: ShipDate: 2016-03-05 - 2016-09-23 Extent 3: ShipDate: 2016-09-24 - 2017-01-06 SELECT Item, sum(Quantity) FROM Orders WHERE ShipDate between ‘2016-01-01’ and ‘2016-01-31’ GROUP BY Item Extent Elimination Id OrderId Line Item Quantity Price Supplier ShipDate ShipMode 1 1 1 Laptop 5 1000 Dell 2016-01-12 G 2 1 2 Monitor 5 200 LG 2016-01-13 G 3 2 1 Mouse 1 20 Logitech 2016-02-05 M 4 3 1 Laptop 3 1600 Apple 2016-01-31 P ... ... ... ... ... ... ... ... ... 8M 2016-03-05 8M+1 2016-03-05 ... ... ... ... ... ... ... ... ... 16M 2016-09-23 16M+1 2016-09-24 ... ... ... ... ... ... ... ... ... 24M 2017-01-06 ELIMINATED PARTITION ELIMINATED PARTITION
  • 27. Tuning Commands MariaDB [test]> select count(*) from t1 where i = 5; +----------+ | count(*) | +----------+ | 2200000 | +----------+ 1 row in set (0.27 sec) MariaDB [test]> select calGetStats()G *************************** 1. row *************************** calGetStats(): Query Stats: MaxMemPct-0; NumTempFiles-0; TempFileSpace-0B; ApproxPhyI/O-11042; CacheI/O-11042; BlocksTouched-11042; PartitionBlocksEliminated-0; MsgBytesIn-332KB; MsgBytesOut-3KB; Mode-Distributed 1 row in set (0.00 sec) calGetStats: Information On The Last Query Executed Within A Given Session
  • 28. MariaDB [test]> select calSetTrace(1); +----------------+ | calSetTrace(1) | +----------------+ | 0 | +----------------+ 1 row in set (0.00 sec) MariaDB [test]> select d.name dept_name, -> count(*) emp_count, -> sum(e.salary) salary_cost -> from emp e -> join i_dept d on e.dept_id = d.dept_id -> group by dept_name; +-------------+-----------+-------------+ | dept_name | emp_count | salary_cost | +-------------+-----------+-------------+ | Engineering | 2 | 2500 | | Sales | 2 | 3800 | +-------------+-----------+-------------+ 2 rows in set, 1 warning (0.03 sec) Tuning Commands calGetTrace: Detailed distributed query execution plan MariaDB [test]> select calGetTrace()G *************************** 1. row *************************** calGetTrace(): Desc Mode Table TableOID ReferencedColumns PIO LIO PBE Elapsed Rows CES UM - - - - - - 0.000 2 BPS PM e 3013 (dept_id,salary) 2 4 0 0.008 2 HJS PM e-d 3013 - - - - ----- - TAS UM - - - - - - 0.000 2 1 row in set (0.00 sec)
  • 29. Tuning Commands Query Statistics Users can view the query statistics by selecting the rows from the query stats table in the infinidb_querystats schema. Example 1 Example 2 List execution time, rows returned for all the select queries within the past 12 hours select queryid, query, endtime-starttime, rows from querystats where starttime >= now() - interval 12 hour and querytype = 'SELECT'; List the average, min and max running time of all the INSERT SELECT queries within the past 12 hours select min(endtime-starttime), max(endtime-starttime), avg(endtimestarttime) from querystats where querytype='INSERT SELECT' and starttime >= now() - interval 12 hour;
  • 30. calpont> getActiveSQLStatements getactivesqlstatements Wed Oct 7 08:38:32 2015 Get List of Active SQL Statements ================================= Start Time Time (hh:mm:ss) Session ID SQL Statement ---------------- ---------------- -------------------- ----------------------------------------------------------- Oct 7 08:38:30 00:00:03 73 select c_name,sum(lo_revenue) from customer, lineorder where lo_custkey = c_custkey andc_custkey = 6 group by c_name getActiveSQLStatements: List Active SQL Statements within the System mysql> show processlist; +----+------+-----------+-------+---------+------+-------+------------------+ | Id | User | Host | db | Command | Time | State | Info | +----+------+-----------+-------+---------+------+-------+------------------+ | 73 | root | localhost | ssb10 | Query | 0 | NULL | show processlist | +----+------+-----------+-------+---------+------+-------+------------------+ Tuning Commands
  • 32. Data Exportand Data Im port Bulk Data Load cpimport, LOAD DATA INFILE Bulk Data Export mysql client, odbc, jdbc Integration with MariaDB ColumnStore cpimport and sql interface
  • 33. Bulk Data Load: cpimport • Fastest way to load data into MariaDB ColumnStore • Load data from CSV file cpimport dbName tblName [loadFile] • Load data from Standard Input mysql -e 'select * from source_table;' -N db2 | cpimport destination_db destination_tbl -s 't‘ • Load data from Binary Source file cpimport -I1 mydb mytable sourcefile.bin • Multiple tables in can be loaded in parallel by launching multiple jobs • Read queries continue without being blocked • Successful cpimport is auto-committed • In case of errors, entire load is rolled back
  • 34. Bulk Data Load: cpimport mode 1 Single file Central Input : Data source at UM cpimport -m1 mytest mytable mytable.tbl cpimport Name Node UM Node Source Data Node PM Node Data Node PM Node Data Node PM Node
  • 35. Bulk Data Load: cpimport mode 2 Distributed Input: Data Source at PMs Partitioned load file on each PM cpimport -m2 testdb mytable /home/mydata/mytable.tbl cpimport Name Node UM Node Source Data Node PM Node Data Node PM Node Data Node PM Node Source Source
  • 36. Distributed Input: Data Source at PMs Partitioned load file on each PM cpimport -m2 testdb mytable /home/mydata/mytable.tbl Bulk load command at one or more PM cpimport –m3 testdb mytable /home/mydata/mytable.tbl Bulk Data Load: cpimport mode 3 Name Node UM Node Source Data Node PM Node Data Node PM Node Data Node PM Node Source Source cpimport cpimport cpimport
  • 37. Traditional way of importing data into any MariaDB storage engine table Bulk Data Load: LOAD DATA INFILE Up to 2 times slower than cpimport for large size imports mysql> load data infile '/tmp/ outfile1.txt' into table destinationTable; Query OK, 9765625 rows affected (2 min 20.01 sec) Records: 9765625 Deleted: 0 Skipped: 0 Warnings: 0 Either success or error operation can be rolled back
  • 38. • Connect with ODBC, JDBC or mysql client to the UM • Extract SQL query results in output file on the UM Bulk Data Export Distributed Export Central Export • Fastest way to do export • Use LOCAL PM query feature • Connect ODBC, JDBC or mysql client to each PM • Extract SQL query results in output file on each PM
  • 40. Data Warehousing Selective column based queries Large number of dimensions High Performance Analytics On Large Volume Of Data Reporting and analysis on millions or billions of rows From datasets containing millions to trillions of rows Terabytes to Petabytes of datasets Analytics Require Complex Joins, Windowing Functions Technical Use Cases
  • 41. Industry Category Use Case Gaming Behavior Analytics Projecting and predicting user behavior based on past and current data Advertising Customer Analytics Customer behavior data for market segmentation and predictive analytics. Advertising Loyalty Analytics Customer analytics focusing on a person’s commitment to a product, company, or brand. Web, E-commerce Click Stream Analytics Web activity analysis, software testing, market research with analytics on data about the clicks areas of web pages while web browsing [Deal News] Marketing Promotional Testing Using marketing and campaign management data to identify the best criteria to be used for a particular marketing offer. Social Network Network Analytics Relationship analytics among network nodes Financial Fraud Analytics Monitoring user financial transactions and identifying patterns of behaviour to predict and detect abnormal or fraudulent activity to prevent damage to user and institution. Healthcare Patient Analytics Analyzing patient medical records to identify patterns to be used for improved medical treatment. Healthcare Clinical Analytics Analyzing clinical data and its impact on patients to identify patterns to be used for improved medical treatment. Telco Network and Application Performance Analytics Streaming data from network devices and applications enriched with business operations data to uncover actionable insights for network planning, operations and marketing analytics Aviation Flight analytics Proactively project parts replacement, maintenance and air-plane retirement based on real-time and historically collected flight parameter data [Boeing] Customer Use Cases
  • 42. Coming Soon - ColumnStore 1.1 ● Text / Blob datatype support ● Bulk Write API Connector ○ Kafka integration ○ Replication integration ○ Custom ● User Defined Aggregate & Window functions. ● Data Redundancy for local storage. ● Installation improvements. ● Performance & stability improvements. ● MariaDB Server 10.2
  • 43. Thank you Andrew Hutchings (LinuxJedi) Senior Software Engineer