SlideShare a Scribd company logo
ColumnStore
OpenSource Engine for
Analytics/BI
Bruno Šimić
Solutions Engineer
Data and big data
OLTP vs. OLAP
OLTP
On-line Transaction Processing
• large number of short on-line
transactions (INSERT, UPDATE,
DELETE)
• very fast query processing, maintaining
data integrity in multi-access
environments
• effectiveness measured by number of
transactions per second
• operational (detailed and current) data
• OLTPs are the original data source
OLAP
On-line Analytical Processing
• characterized by low volume of
concurrent transactions
• complex queries, often involving
aggregations
• response time is effectiveness measure
• data is aggregated, historical and stored
in multi-dimensional schemas
• OLAP data comes from the various
OLTP Databases
Rows/DataSize Scope
1 100 10,000 1,000,000 100,000,000 10,000,000,000 100,000,000,000
10-100GB 100-1000GB 1-10TB 10-100TB...PB
MariaDB OLTP MariaDB ColumnStore OLAP
1. Descriptive Analytics
What is Happening?
Traditional OLAP
2. Diagnostic Analytics
Why did it Happen?
3. Predictive Analytics
What is likely to happen?
4. Prescriptive Analytics
What should I do about it?
Big Data Analytics
Social Media
Sensors
Node 1
Biometrics
Mobile
Data Collection
MariaDB
ColumnStore
Data Processing
BI Tools, Data Science
Applications
Connectors,
SPARK Integration etc
MariaDB
MaxScale
Transactional,
Operational
Analytics
Insight
MariaDB ColumnStore Architecture
Columnar Distributed Data Storage
Local Storage | SAN | EBS | Gluster FS
BI Tool SQL Client Custom
Big Data
App
Application
MariaDB SQL
Front End
Distributed
Query Engine
Data
Storage
User Module (UM)
Performance
Module (PM)
User Modules
• mysqld - The MariaDB server
• ExeMgr - MariaDB’s interface to
ColumnStore
• cpimport - high-performance data
import
Query Processing - UM
• SQL Operations are translated into
thousands of Primitives
• Parallel/Distributed SQL
• Extensible with Parallel/Distributed
UDFs
• Query is parsed by mysqld on UM node
• Parsed query handed over to ExeMgr
on UM node
• ExecMgr breaks down the query in
primitive operations
MariaDB SQL
Front End
User Module (UM)
Performance Modules
• PrimProc - Primitives Processor
• WriteEngineServ - Database file writing
processor
• DMLProc - DML writes processor
• DDLProc - DDL processor
Query Processing - PM
• Primitives are processed on PM
• One thread working on a range of rows
• Typically 1/2 million rows, stored in a
few hundred blocks of data
• Execute all column operations required
(restriction and projection)
• Execute any group by/aggregation
against local data
• Each primitive executes in a fraction of
a second
• Primitives are run in parallel and fully
distributed
Distributed
Query Engine
Performance
Module (PM)
Storage Architecture
• Columnar storage
– Each column stored as separate file
– No index management for query
performance tuning
– Online Schema changes: Add new column
without impacting running queries
• Automatic horizontal partitioning
– Logical partition every 8 Million rows
– In memory metadata of partition min and max
– No partition management for query
performance tuning
• Compression
– Accelerate decompression rate
– Reduce I/O for compressed blocks
Column 1
Extent 1 (8 million rows, 8MB~64MB)
Extent 2 (8 million rows)
Extent M (8 million rows)
Column 2 Column 3 ... Column N
Data automatically arranged by
• Column – Acts as Vertical Partitioning
• Extents – Acts as horizontal partition
Vertical
Partition
Horizontal
Partition
...
Vertical
Partition
Vertical
Partition
Vertical
Partition
Horizontal
Partition
Horizontal
Partition
High Performance Data Ingestion
• Fully parallel high
speed data load
– Parallel data loads on all PMs simultaneously
– Multiple tables in can be loaded simultaneously
– Read queries continue without being blocked
• Micro-batch loading
for real-time data flow
Column 1
Extent 1 (8 million rows, 8MB~64MB)
Extent 2 (8 million rows)
Extent M (8 million rows)
Column 2 ... Column N
Horizontal
Partition
...
Horizontal
Partition
Horizontal
Partition
High Water Mark
New Data being loaded
Dataaccessedby
runningqueries
Column-oriented storage
Differences to row-oriented storage
Row oriented:
rows stored
sequentially in a file.
Column oriented:
Each column is stored
in a separate file. Each
column for a given
row is at the same
offset.
Row-oriented vs. Column-oriented format
Key Fname Lname State Zip Phone Age Sex
1 Bugs Bunny NY 11217 (718) 938-3235 34 M
2 Yosemite Sam CA 95389 (209) 375-6572 52 M
3 Daffy Duck NY 10013 (212) 227-1810 35 M
4 Elmer Fudd ME 04578 (207) 882-7323 43 M
5 Witch Hazel MA 01970 (978) 744-0991 57 F
Key
1
2
3
4
5
Fname
Bugs
Yosemite
Daffy
Elmer
Witch
Lname
Bunny
Sam
Duck
Fudd
Hazel
State
NY
CA
NY
ME
MA
Zip
11217
95389
10013
04578
01970
Phone
(718) 938-3235
(209) 375-6572
(212) 227-1810
(207) 882-7323
(978) 744-0991
Age
34
52
35
43
57
Sex
M
M
M
M
F
Row oriented:
new rows appended to
the end.
Column oriented:
new value added to
each file.
Single-Row Operations - Insert
Key Fname Lname State Zip Phone Age Sex
1 Bugs Bunny NY 11217 (718) 938-3235 34 M
2 Yosemite Sam CA 95389 (209) 375-6572 52 M
3 Daffy Duck NY 10013 (212) 227-1810 35 M
4 Elmer Fudd ME 04578 (207) 882-7323 43 M
5 Witch Hazel MA 01970 (978) 744-0991 57 F
6 Marvin Martian CA 91602 (818) 761-9964 26 M
Key
1
2
3
4
5
Fname
Bugs
Yosemite
Daffy
Elmer
Witch
Lname
Bunny
Sam
Duck
Fudd
Hazel
State
NY
CA
NY
ME
MA
Zip
11217
95389
10013
04578
01970
Phone
(718) 938-3235
(209) 375-6572
(212) 227-1810
(207) 882-7323
(978) 744-0991
Age
34
52
35
43
57
Sex
M
M
M
M
F
6 Marvin Martian CA 91602 (818) 761-9964 26 M
Columnar insert not efficient for singleton insertions (OLTP). Batch loads touches
row vs. column. Batch load on column-oriented is faster (compression, no indexes).
Row oriented:
new rows deleted
Column oriented:
value deleted from
each file
Single-Row Operations - Delete
Key Fname Lname State Zip Phone Age Sex
1 Bugs Bunny NY 11217 (718) 938-3235 34 M
2 Yosemite Sam CA 95389 (209) 375-6572 52 M
3 Daffy Duck NY 10013 (212) 227-1810 35 M
4 Elmer Fudd ME 04578 (207) 882-7323 43 M
5 Witch Hazel MA 01970 (978) 744-0991 57 F
6 Marvin Martian CA 91602 (818) 761-9964 26 M
Key
1
2
3
4
5
Fname
Bugs
Yosemite
Daffy
Elmer
Witch
Lname
Bunny
Sam
Duck
Fudd
Hazel
State
NY
CA
NY
ME
MA
Zip
11217
95389
10013
04578
01970
Phone
(718) 938-3235
(209) 375-6572
(212) 227-1810
(207) 882-7323
(978) 744-0991
Age
34
52
35
43
57
Sex
M
M
M
M
F
6 Marvin Martian CA 91602 (818) 761-9964 26 M
Recommended Partition Drop to allow dropping columns in bulk.
Row oriented:
Update 100% of rows
means change 100%
of blocks on disk.
Column oriented:
Just update the blocks
needed to be updated
Single-Row Operations - Update
Key Fname Lname State Zip Phone Age Sex
1 Bugs Bunny NY 11217 (718) 938-3235 34 M
2 Yosemite Sam CA 95389 (209) 375-6572 52 M
3 Daffy Duck NY 10013 (212) 227-1810 35 M
4 Elmer Fudd ME 04578 (207) 882-7323 43 M
5 Witch Hazel MA 01970 (978) 744-0991 57 F
Key
1
2
3
4
5
Fname
Bugs
Yosemite
Daffy
Elmer
Witch
Lname
Bunny
Sam
Duck
Fudd
Hazel
State
NY
CA
NY
ME
MA
Zip
11217
95389
10013
04578
01970
Phone
(718) 938-3235
(209) 375-6572
(212) 227-1810
(207) 882-7323
(978) 744-0991
Age
34
52
35
43
57
Sex
M
M
M
M
F
Row oriented:
requires rebuilding of
the whole table
Column oriented:
Create new file for the
new column
Changing the table structure
Key Fname Lname State Zip Phone Age Sex Active
1 Bugs Bunny NY 11217 (718) 938-3235 34 M Y
2 Yosemite Sam CA 95389 (209) 375-6572 52 M N
3 Daffy Duck NY 10013 (212) 227-1810 35 M N
4 Elmer Fudd ME 04578 (207) 882-7323 43 M Y
5 Witch Hazel MA 01970 (978) 744-0991 57 F N
Key
1
2
3
4
5
Fname
Bugs
Yosemite
Daffy
Elmer
Witch
Lname
Bunny
Sam
Duck
Fudd
Hazel
State
NY
CA
NY
ME
MA
Zip
11217
95389
10013
04578
01970
Phone
(718) 938-3235
(209) 375-6572
(212) 227-1810
(207) 882-7323
(978) 744-0991
Age
34
52
35
43
57
Sex
M
M
M
M
F
Active
Y
N
N
Y
N
Column-oriented is very flexible for adding columns, no need for a full rebuild
required with it.
Horizontal
Partition:
8 Million Rows
Extent 2
Horizontal
Partition:
8 Million Rows
Extent 3
Horizontal
Partition:
8 Million Rows
Extent 1
Storage Architecture reduces I/O
• Only touch column files
that are in projection, filter
and join conditions
• Eliminate disk block touches
to partitions outside filter
and join conditions
Extent 1:
Min State: CA, Max State: NY
Extent 2:
Min State: OR, Max State: WY
Extent 3:
Min State: IA, Max State: TN
SELECT Fname FROM Table 1 WHERE State = ‘NY’
High Performance Query Processing
ID
1
2
3
4
...
8M
8M+1
...
16M
16M+1
...
24M
Fname
Bugs
Yosemite
Daffy
Hazel
...
...
Jane
...
Elmer
Lname
Bunny
Sam
Duck
Fudd
...
...
...
State
NY
CA
NY
ME
...
MN
WY
TX
OR
...
VA
TN
IA
NY
...
PA
Zip
11217
95389
10013
04578
...
...
...
Phone
(718) 938-3235
(209) 375-6572
(212) 227-1810
(207) 882-7323
...
...
...
Age
34
52
35
43
...
...
...
Sex
M
M
M
F
...
...
...
Vertical
Partition
Vertical
Partition
Vertical
Partition
Vertical
Partition
Vertical
Partition
…
ELIMINATED PARTITION
Analytics
– In-database distributed analytics with complex
join, aggregation, window functions
– Extensible UDF for custom analytics
– Cross Engine Join with other storage engines
Window functions
– PARTITION BY / ORDER BY
– Aggregate functions: MAX, MIN, COUNT,
SUM, AVG STD, STDDEV_SAMP,
STDDEV_POP, VAR_SAMP, VAR_POP
– Ranking: ROW_NUMBER, RANK,
DENSE_RANK, PERCENT_RANK
CUME_DIST, NTILE, PERCENTILE,
PERCENTILE_CONT
PERCENTILE_DISC, MEDIAN
Daily Running Average Revenue for each item
SELECT item_id, server_date, daily_revenue,
AVG(revenue) OVER
(PARTITION BY item_id ORDER BY server_date
RANGE INTERVAL '1' DAY PRECEDING ) running_avg
FROM web_item_sales
Item ID Server_date Revenue
1 02-01-2014 20,000.00
1 02-02-2014 5,001.00
2 02-01-2014 15,000.00
2 02-04-2014 34,029.00
2 02-05-2014 7,138.00
3 02-01-2014 17,250.00
3 02-03-2014 25,010.00
3 02-04-2014 21,034.00
3 02-05-2014 4,120.00
Running Average
20,000.00
12,500.50
15,000.00
34,209.00
20,583.50
17,250.00
250,100.00
12,577.00
20,583.50
MariaDB ColumnStore
Best practices
General
• Not suited for OLTP, needs big data to
process fast (millions of records)
• Micro-batch load allows near real-time
behaviour
• Infrequently used columns do not
impact other queries
• Columnar suitable for sparse columns
(nulls compress nicely)
Query Modeling
• Star-schema optimizations are
generally a good idea
• Conservative data typing is important
– fixed-length vs. dictionary boundary (8
bytes)
– IP Address vs. IP Number
• Break down compound fields into
individual fields
– Trivializes searching for sub-fields
– Can avoid dictionary overhead
– Cost to re-assemble is generally small
Best Practices
Cpimport
• Fastest way to load data from CSV file,
standard input, binary source file
• Multiple tables in can be loaded in
parallel by launching multiple jobs
• Read queries continue without being
blocked
• Successful cpimport is auto-committed
• In case of errors, entire load is rolled
back
LOAD DATA INFILE
• Traditional way of importing data into
any MariaDB storage engine table
• Up to 2 times slower than cpimport for
large size imports
• Either success or error operation can be
rolled back
Data Ingestion
HA at UM node
• When one UM node goes down, another
UM node takes over
HA at Data Storage
• AWS EBS (Elastic Block Store)
• GlusterFS - Multiple copy of data block
across storage. If a disk on a PM node
fails, another PM node will have access
to the copy of the data
High Availability
HA at PM node
• SAN/AWS EBS - When a PM node
goes down, the data volumes
attached to the failed PM node gets
attached to another PM
• Local Disks -If a PM node goes down,
the data on its disks are not available,
though queries continue on the
remaining data set
Where to find MariaDB ColumnStore?
SOFTWARE DOWNLOAD https://mariadb.com/downloads/columnstore
SOURCE https://github.com/mariadb-corporation/mariadb-columnstore-engine
DOCUMENTATION https://mariadb.com/kb/en/mariadb/mariadb-columnstore/
BLOGS https://mariadb.com/blog-tags/columnstore
</>
Thank you
Bruno Šimić
bruno@mariadb.com

More Related Content

Viewers also liked

Photoshop
PhotoshopPhotoshop
Photoshop
Juni Muryadi
 
Introduction to Amazon Relational Database Service
Introduction to Amazon Relational Database ServiceIntroduction to Amazon Relational Database Service
Introduction to Amazon Relational Database Service
Amazon Web Services
 
Engineers: Jolt Your Creativity Into Overdrive
Engineers: Jolt Your Creativity Into OverdriveEngineers: Jolt Your Creativity Into Overdrive
Engineers: Jolt Your Creativity Into Overdrive
Hynes Industries
 
SENDERO ECOLIGICO
SENDERO ECOLIGICOSENDERO ECOLIGICO
SENDERO ECOLIGICO
leidy tatiana sanchez cruz
 
Nose job
Nose jobNose job
Nose job
Health First
 
Cono sur
Cono surCono sur
Cono sur
Yre Mancini
 

Viewers also liked (6)

Photoshop
PhotoshopPhotoshop
Photoshop
 
Introduction to Amazon Relational Database Service
Introduction to Amazon Relational Database ServiceIntroduction to Amazon Relational Database Service
Introduction to Amazon Relational Database Service
 
Engineers: Jolt Your Creativity Into Overdrive
Engineers: Jolt Your Creativity Into OverdriveEngineers: Jolt Your Creativity Into Overdrive
Engineers: Jolt Your Creativity Into Overdrive
 
SENDERO ECOLIGICO
SENDERO ECOLIGICOSENDERO ECOLIGICO
SENDERO ECOLIGICO
 
Nose job
Nose jobNose job
Nose job
 
Cono sur
Cono surCono sur
Cono sur
 

Similar to MariaDB ColumnStore - LONDON MySQL Meetup

Big Data Analytics with MariaDB AX
Big Data Analytics with MariaDB AXBig Data Analytics with MariaDB AX
Big Data Analytics with MariaDB AX
MariaDB plc
 
In-depth session: Big Data Analytics with MariaDB AX
In-depth session: Big Data Analytics with MariaDB AXIn-depth session: Big Data Analytics with MariaDB AX
In-depth session: Big Data Analytics with MariaDB AX
MariaDB plc
 
Transactional and Analytics together: MariaDB and ColumnStore
Transactional and Analytics together: MariaDB and ColumnStoreTransactional and Analytics together: MariaDB and ColumnStore
Transactional and Analytics together: MariaDB and ColumnStore
mlraviol
 
Turning the tables: The Columnar Alternative
Turning the tables:  The Columnar AlternativeTurning the tables:  The Columnar Alternative
Turning the tables: The Columnar Alternative
Jim Tommaney
 
Big Data LDN 2017: Big Data Analytics with MariaDB ColumnStore
Big Data LDN 2017: Big Data Analytics with MariaDB ColumnStoreBig Data LDN 2017: Big Data Analytics with MariaDB ColumnStore
Big Data LDN 2017: Big Data Analytics with MariaDB ColumnStore
Matt Stubbs
 
What You Need To Know About The Top Database Trends
What You Need To Know About The Top Database TrendsWhat You Need To Know About The Top Database Trends
What You Need To Know About The Top Database Trends
Dell World
 
Hotsos 2012
Hotsos 2012Hotsos 2012
Hotsos 2012
Connor McDonald
 
Big Data Analytics with MariaDB ColumnStore
Big Data Analytics with MariaDB ColumnStoreBig Data Analytics with MariaDB ColumnStore
Big Data Analytics with MariaDB ColumnStore
MariaDB plc
 
Performance tuning ColumnStore
Performance tuning ColumnStorePerformance tuning ColumnStore
Performance tuning ColumnStore
MariaDB plc
 
Getting started with Cassandra 2.1
Getting started with Cassandra 2.1Getting started with Cassandra 2.1
Getting started with Cassandra 2.1
Viswanath J
 
Hailey_Database_Performance_Made_Easy_through_Graphics.pdf
Hailey_Database_Performance_Made_Easy_through_Graphics.pdfHailey_Database_Performance_Made_Easy_through_Graphics.pdf
Hailey_Database_Performance_Made_Easy_through_Graphics.pdf
cookie1969
 
Demystifying Columnar Databases
Demystifying Columnar DatabasesDemystifying Columnar Databases
Demystifying Columnar Databases
June Tong
 
M|18 Understanding the Architecture of MariaDB ColumnStore
M|18 Understanding the Architecture of MariaDB ColumnStoreM|18 Understanding the Architecture of MariaDB ColumnStore
M|18 Understanding the Architecture of MariaDB ColumnStore
MariaDB plc
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
Amazon Web Services
 
MySQL vs. MonetDB
MySQL vs. MonetDBMySQL vs. MonetDB
MySQL vs. MonetDB
"FENG "GEORGE"" YU
 
Functional Leap of Faith (Keynote at JDay Lviv 2014)
Functional Leap of Faith (Keynote at JDay Lviv 2014)Functional Leap of Faith (Keynote at JDay Lviv 2014)
Functional Leap of Faith (Keynote at JDay Lviv 2014)
Tomer Gabel
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
Amazon Web Services
 
14 scaleabilty wics
14 scaleabilty wics14 scaleabilty wics
14 scaleabilty wics
ashish61_scs
 
cablecrimping-150219125046-conversion-gate01.pptx
cablecrimping-150219125046-conversion-gate01.pptxcablecrimping-150219125046-conversion-gate01.pptx
cablecrimping-150219125046-conversion-gate01.pptx
ANISHTP
 

Similar to MariaDB ColumnStore - LONDON MySQL Meetup (20)

Big Data Analytics with MariaDB AX
Big Data Analytics with MariaDB AXBig Data Analytics with MariaDB AX
Big Data Analytics with MariaDB AX
 
In-depth session: Big Data Analytics with MariaDB AX
In-depth session: Big Data Analytics with MariaDB AXIn-depth session: Big Data Analytics with MariaDB AX
In-depth session: Big Data Analytics with MariaDB AX
 
Transactional and Analytics together: MariaDB and ColumnStore
Transactional and Analytics together: MariaDB and ColumnStoreTransactional and Analytics together: MariaDB and ColumnStore
Transactional and Analytics together: MariaDB and ColumnStore
 
Turning the tables: The Columnar Alternative
Turning the tables:  The Columnar AlternativeTurning the tables:  The Columnar Alternative
Turning the tables: The Columnar Alternative
 
Big Data LDN 2017: Big Data Analytics with MariaDB ColumnStore
Big Data LDN 2017: Big Data Analytics with MariaDB ColumnStoreBig Data LDN 2017: Big Data Analytics with MariaDB ColumnStore
Big Data LDN 2017: Big Data Analytics with MariaDB ColumnStore
 
What You Need To Know About The Top Database Trends
What You Need To Know About The Top Database TrendsWhat You Need To Know About The Top Database Trends
What You Need To Know About The Top Database Trends
 
Hotsos 2012
Hotsos 2012Hotsos 2012
Hotsos 2012
 
Big Data Analytics with MariaDB ColumnStore
Big Data Analytics with MariaDB ColumnStoreBig Data Analytics with MariaDB ColumnStore
Big Data Analytics with MariaDB ColumnStore
 
Quick Wins
Quick WinsQuick Wins
Quick Wins
 
Performance tuning ColumnStore
Performance tuning ColumnStorePerformance tuning ColumnStore
Performance tuning ColumnStore
 
Getting started with Cassandra 2.1
Getting started with Cassandra 2.1Getting started with Cassandra 2.1
Getting started with Cassandra 2.1
 
Hailey_Database_Performance_Made_Easy_through_Graphics.pdf
Hailey_Database_Performance_Made_Easy_through_Graphics.pdfHailey_Database_Performance_Made_Easy_through_Graphics.pdf
Hailey_Database_Performance_Made_Easy_through_Graphics.pdf
 
Demystifying Columnar Databases
Demystifying Columnar DatabasesDemystifying Columnar Databases
Demystifying Columnar Databases
 
M|18 Understanding the Architecture of MariaDB ColumnStore
M|18 Understanding the Architecture of MariaDB ColumnStoreM|18 Understanding the Architecture of MariaDB ColumnStore
M|18 Understanding the Architecture of MariaDB ColumnStore
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
MySQL vs. MonetDB
MySQL vs. MonetDBMySQL vs. MonetDB
MySQL vs. MonetDB
 
Functional Leap of Faith (Keynote at JDay Lviv 2014)
Functional Leap of Faith (Keynote at JDay Lviv 2014)Functional Leap of Faith (Keynote at JDay Lviv 2014)
Functional Leap of Faith (Keynote at JDay Lviv 2014)
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
14 scaleabilty wics
14 scaleabilty wics14 scaleabilty wics
14 scaleabilty wics
 
cablecrimping-150219125046-conversion-gate01.pptx
cablecrimping-150219125046-conversion-gate01.pptxcablecrimping-150219125046-conversion-gate01.pptx
cablecrimping-150219125046-conversion-gate01.pptx
 

More from Ivan Zoratti

AI, ML and Graph Algorithms: Real Life Use Cases with Neo4j
AI, ML and Graph Algorithms: Real Life Use Cases with Neo4jAI, ML and Graph Algorithms: Real Life Use Cases with Neo4j
AI, ML and Graph Algorithms: Real Life Use Cases with Neo4j
Ivan Zoratti
 
Introducing the Open Edge Module
Introducing the Open Edge ModuleIntroducing the Open Edge Module
Introducing the Open Edge Module
Ivan Zoratti
 
MySQL Performance Tuning London Meetup June 2017
MySQL Performance Tuning London Meetup June 2017MySQL Performance Tuning London Meetup June 2017
MySQL Performance Tuning London Meetup June 2017
Ivan Zoratti
 
ScaleDB Technical Presentation
ScaleDB Technical PresentationScaleDB Technical Presentation
ScaleDB Technical Presentation
Ivan Zoratti
 
Time Series From Collection To Analysis
Time Series From Collection To AnalysisTime Series From Collection To Analysis
Time Series From Collection To Analysis
Ivan Zoratti
 
ScaleDB Technical Presentation
ScaleDB Technical PresentationScaleDB Technical Presentation
ScaleDB Technical Presentation
Ivan Zoratti
 
MySQL for Beginners - part 1
MySQL for Beginners - part 1MySQL for Beginners - part 1
MySQL for Beginners - part 1
Ivan Zoratti
 
Anatomy of a Proxy Server - MaxScale Internals
Anatomy of a Proxy Server - MaxScale InternalsAnatomy of a Proxy Server - MaxScale Internals
Anatomy of a Proxy Server - MaxScale Internals
Ivan Zoratti
 
Orchestrating MySQL
Orchestrating MySQLOrchestrating MySQL
Orchestrating MySQL
Ivan Zoratti
 
GTIDs Explained
GTIDs ExplainedGTIDs Explained
GTIDs Explained
Ivan Zoratti
 
The Evolution of Open Source Databases
The Evolution of Open Source DatabasesThe Evolution of Open Source Databases
The Evolution of Open Source Databases
Ivan Zoratti
 
MaxScale for Effective MySQL Meetup NYC - 14.01.21
MaxScale for Effective MySQL Meetup NYC - 14.01.21MaxScale for Effective MySQL Meetup NYC - 14.01.21
MaxScale for Effective MySQL Meetup NYC - 14.01.21
Ivan Zoratti
 
MariaDB 10 Tutorial - 13.11.11 - Percona Live London
MariaDB 10 Tutorial - 13.11.11 - Percona Live LondonMariaDB 10 Tutorial - 13.11.11 - Percona Live London
MariaDB 10 Tutorial - 13.11.11 - Percona Live London
Ivan Zoratti
 
SkySQL & MariaDB What's all the buzz?
SkySQL & MariaDB What's all the buzz?SkySQL & MariaDB What's all the buzz?
SkySQL & MariaDB What's all the buzz?
Ivan Zoratti
 
MySQL & MariaDB - Innovation Happens Here
MySQL & MariaDB - Innovation Happens HereMySQL & MariaDB - Innovation Happens Here
MySQL & MariaDB - Innovation Happens Here
Ivan Zoratti
 
What can we learn from NoSQL technologies?
What can we learn from NoSQL technologies?What can we learn from NoSQL technologies?
What can we learn from NoSQL technologies?
Ivan Zoratti
 
Big Data with MySQL
Big Data with MySQLBig Data with MySQL
Big Data with MySQL
Ivan Zoratti
 
Sky Is The limit
Sky Is The limitSky Is The limit
Sky Is The limit
Ivan Zoratti
 
The sky's the limit
The sky's the limitThe sky's the limit
The sky's the limit
Ivan Zoratti
 

More from Ivan Zoratti (20)

AI, ML and Graph Algorithms: Real Life Use Cases with Neo4j
AI, ML and Graph Algorithms: Real Life Use Cases with Neo4jAI, ML and Graph Algorithms: Real Life Use Cases with Neo4j
AI, ML and Graph Algorithms: Real Life Use Cases with Neo4j
 
Introducing the Open Edge Module
Introducing the Open Edge ModuleIntroducing the Open Edge Module
Introducing the Open Edge Module
 
MySQL Performance Tuning London Meetup June 2017
MySQL Performance Tuning London Meetup June 2017MySQL Performance Tuning London Meetup June 2017
MySQL Performance Tuning London Meetup June 2017
 
ScaleDB Technical Presentation
ScaleDB Technical PresentationScaleDB Technical Presentation
ScaleDB Technical Presentation
 
Time Series From Collection To Analysis
Time Series From Collection To AnalysisTime Series From Collection To Analysis
Time Series From Collection To Analysis
 
ScaleDB Technical Presentation
ScaleDB Technical PresentationScaleDB Technical Presentation
ScaleDB Technical Presentation
 
MySQL for Beginners - part 1
MySQL for Beginners - part 1MySQL for Beginners - part 1
MySQL for Beginners - part 1
 
Anatomy of a Proxy Server - MaxScale Internals
Anatomy of a Proxy Server - MaxScale InternalsAnatomy of a Proxy Server - MaxScale Internals
Anatomy of a Proxy Server - MaxScale Internals
 
Orchestrating MySQL
Orchestrating MySQLOrchestrating MySQL
Orchestrating MySQL
 
GTIDs Explained
GTIDs ExplainedGTIDs Explained
GTIDs Explained
 
The Evolution of Open Source Databases
The Evolution of Open Source DatabasesThe Evolution of Open Source Databases
The Evolution of Open Source Databases
 
MaxScale for Effective MySQL Meetup NYC - 14.01.21
MaxScale for Effective MySQL Meetup NYC - 14.01.21MaxScale for Effective MySQL Meetup NYC - 14.01.21
MaxScale for Effective MySQL Meetup NYC - 14.01.21
 
MariaDB 10 Tutorial - 13.11.11 - Percona Live London
MariaDB 10 Tutorial - 13.11.11 - Percona Live LondonMariaDB 10 Tutorial - 13.11.11 - Percona Live London
MariaDB 10 Tutorial - 13.11.11 - Percona Live London
 
SkySQL & MariaDB What's all the buzz?
SkySQL & MariaDB What's all the buzz?SkySQL & MariaDB What's all the buzz?
SkySQL & MariaDB What's all the buzz?
 
MySQL & MariaDB - Innovation Happens Here
MySQL & MariaDB - Innovation Happens HereMySQL & MariaDB - Innovation Happens Here
MySQL & MariaDB - Innovation Happens Here
 
What can we learn from NoSQL technologies?
What can we learn from NoSQL technologies?What can we learn from NoSQL technologies?
What can we learn from NoSQL technologies?
 
Big Data with MySQL
Big Data with MySQLBig Data with MySQL
Big Data with MySQL
 
Sky Is The limit
Sky Is The limitSky Is The limit
Sky Is The limit
 
The sky's the limit
The sky's the limitThe sky's the limit
The sky's the limit
 
HA Reloaded
HA ReloadedHA Reloaded
HA Reloaded
 

Recently uploaded

みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
nkrafacyberclub
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
ThomasParaiso2
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 

Recently uploaded (20)

みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 

MariaDB ColumnStore - LONDON MySQL Meetup

  • 2. Data and big data OLTP vs. OLAP
  • 3. OLTP On-line Transaction Processing • large number of short on-line transactions (INSERT, UPDATE, DELETE) • very fast query processing, maintaining data integrity in multi-access environments • effectiveness measured by number of transactions per second • operational (detailed and current) data • OLTPs are the original data source OLAP On-line Analytical Processing • characterized by low volume of concurrent transactions • complex queries, often involving aggregations • response time is effectiveness measure • data is aggregated, historical and stored in multi-dimensional schemas • OLAP data comes from the various OLTP Databases Rows/DataSize Scope 1 100 10,000 1,000,000 100,000,000 10,000,000,000 100,000,000,000 10-100GB 100-1000GB 1-10TB 10-100TB...PB MariaDB OLTP MariaDB ColumnStore OLAP
  • 4. 1. Descriptive Analytics What is Happening? Traditional OLAP 2. Diagnostic Analytics Why did it Happen? 3. Predictive Analytics What is likely to happen? 4. Prescriptive Analytics What should I do about it? Big Data Analytics Social Media Sensors Node 1 Biometrics Mobile Data Collection MariaDB ColumnStore Data Processing BI Tools, Data Science Applications Connectors, SPARK Integration etc MariaDB MaxScale Transactional, Operational Analytics Insight
  • 5. MariaDB ColumnStore Architecture Columnar Distributed Data Storage Local Storage | SAN | EBS | Gluster FS BI Tool SQL Client Custom Big Data App Application MariaDB SQL Front End Distributed Query Engine Data Storage User Module (UM) Performance Module (PM)
  • 6. User Modules • mysqld - The MariaDB server • ExeMgr - MariaDB’s interface to ColumnStore • cpimport - high-performance data import Query Processing - UM • SQL Operations are translated into thousands of Primitives • Parallel/Distributed SQL • Extensible with Parallel/Distributed UDFs • Query is parsed by mysqld on UM node • Parsed query handed over to ExeMgr on UM node • ExecMgr breaks down the query in primitive operations MariaDB SQL Front End User Module (UM)
  • 7. Performance Modules • PrimProc - Primitives Processor • WriteEngineServ - Database file writing processor • DMLProc - DML writes processor • DDLProc - DDL processor Query Processing - PM • Primitives are processed on PM • One thread working on a range of rows • Typically 1/2 million rows, stored in a few hundred blocks of data • Execute all column operations required (restriction and projection) • Execute any group by/aggregation against local data • Each primitive executes in a fraction of a second • Primitives are run in parallel and fully distributed Distributed Query Engine Performance Module (PM)
  • 8. Storage Architecture • Columnar storage – Each column stored as separate file – No index management for query performance tuning – Online Schema changes: Add new column without impacting running queries • Automatic horizontal partitioning – Logical partition every 8 Million rows – In memory metadata of partition min and max – No partition management for query performance tuning • Compression – Accelerate decompression rate – Reduce I/O for compressed blocks Column 1 Extent 1 (8 million rows, 8MB~64MB) Extent 2 (8 million rows) Extent M (8 million rows) Column 2 Column 3 ... Column N Data automatically arranged by • Column – Acts as Vertical Partitioning • Extents – Acts as horizontal partition Vertical Partition Horizontal Partition ... Vertical Partition Vertical Partition Vertical Partition Horizontal Partition Horizontal Partition
  • 9. High Performance Data Ingestion • Fully parallel high speed data load – Parallel data loads on all PMs simultaneously – Multiple tables in can be loaded simultaneously – Read queries continue without being blocked • Micro-batch loading for real-time data flow Column 1 Extent 1 (8 million rows, 8MB~64MB) Extent 2 (8 million rows) Extent M (8 million rows) Column 2 ... Column N Horizontal Partition ... Horizontal Partition Horizontal Partition High Water Mark New Data being loaded Dataaccessedby runningqueries
  • 11. Row oriented: rows stored sequentially in a file. Column oriented: Each column is stored in a separate file. Each column for a given row is at the same offset. Row-oriented vs. Column-oriented format Key Fname Lname State Zip Phone Age Sex 1 Bugs Bunny NY 11217 (718) 938-3235 34 M 2 Yosemite Sam CA 95389 (209) 375-6572 52 M 3 Daffy Duck NY 10013 (212) 227-1810 35 M 4 Elmer Fudd ME 04578 (207) 882-7323 43 M 5 Witch Hazel MA 01970 (978) 744-0991 57 F Key 1 2 3 4 5 Fname Bugs Yosemite Daffy Elmer Witch Lname Bunny Sam Duck Fudd Hazel State NY CA NY ME MA Zip 11217 95389 10013 04578 01970 Phone (718) 938-3235 (209) 375-6572 (212) 227-1810 (207) 882-7323 (978) 744-0991 Age 34 52 35 43 57 Sex M M M M F
  • 12. Row oriented: new rows appended to the end. Column oriented: new value added to each file. Single-Row Operations - Insert Key Fname Lname State Zip Phone Age Sex 1 Bugs Bunny NY 11217 (718) 938-3235 34 M 2 Yosemite Sam CA 95389 (209) 375-6572 52 M 3 Daffy Duck NY 10013 (212) 227-1810 35 M 4 Elmer Fudd ME 04578 (207) 882-7323 43 M 5 Witch Hazel MA 01970 (978) 744-0991 57 F 6 Marvin Martian CA 91602 (818) 761-9964 26 M Key 1 2 3 4 5 Fname Bugs Yosemite Daffy Elmer Witch Lname Bunny Sam Duck Fudd Hazel State NY CA NY ME MA Zip 11217 95389 10013 04578 01970 Phone (718) 938-3235 (209) 375-6572 (212) 227-1810 (207) 882-7323 (978) 744-0991 Age 34 52 35 43 57 Sex M M M M F 6 Marvin Martian CA 91602 (818) 761-9964 26 M Columnar insert not efficient for singleton insertions (OLTP). Batch loads touches row vs. column. Batch load on column-oriented is faster (compression, no indexes).
  • 13. Row oriented: new rows deleted Column oriented: value deleted from each file Single-Row Operations - Delete Key Fname Lname State Zip Phone Age Sex 1 Bugs Bunny NY 11217 (718) 938-3235 34 M 2 Yosemite Sam CA 95389 (209) 375-6572 52 M 3 Daffy Duck NY 10013 (212) 227-1810 35 M 4 Elmer Fudd ME 04578 (207) 882-7323 43 M 5 Witch Hazel MA 01970 (978) 744-0991 57 F 6 Marvin Martian CA 91602 (818) 761-9964 26 M Key 1 2 3 4 5 Fname Bugs Yosemite Daffy Elmer Witch Lname Bunny Sam Duck Fudd Hazel State NY CA NY ME MA Zip 11217 95389 10013 04578 01970 Phone (718) 938-3235 (209) 375-6572 (212) 227-1810 (207) 882-7323 (978) 744-0991 Age 34 52 35 43 57 Sex M M M M F 6 Marvin Martian CA 91602 (818) 761-9964 26 M Recommended Partition Drop to allow dropping columns in bulk.
  • 14. Row oriented: Update 100% of rows means change 100% of blocks on disk. Column oriented: Just update the blocks needed to be updated Single-Row Operations - Update Key Fname Lname State Zip Phone Age Sex 1 Bugs Bunny NY 11217 (718) 938-3235 34 M 2 Yosemite Sam CA 95389 (209) 375-6572 52 M 3 Daffy Duck NY 10013 (212) 227-1810 35 M 4 Elmer Fudd ME 04578 (207) 882-7323 43 M 5 Witch Hazel MA 01970 (978) 744-0991 57 F Key 1 2 3 4 5 Fname Bugs Yosemite Daffy Elmer Witch Lname Bunny Sam Duck Fudd Hazel State NY CA NY ME MA Zip 11217 95389 10013 04578 01970 Phone (718) 938-3235 (209) 375-6572 (212) 227-1810 (207) 882-7323 (978) 744-0991 Age 34 52 35 43 57 Sex M M M M F
  • 15. Row oriented: requires rebuilding of the whole table Column oriented: Create new file for the new column Changing the table structure Key Fname Lname State Zip Phone Age Sex Active 1 Bugs Bunny NY 11217 (718) 938-3235 34 M Y 2 Yosemite Sam CA 95389 (209) 375-6572 52 M N 3 Daffy Duck NY 10013 (212) 227-1810 35 M N 4 Elmer Fudd ME 04578 (207) 882-7323 43 M Y 5 Witch Hazel MA 01970 (978) 744-0991 57 F N Key 1 2 3 4 5 Fname Bugs Yosemite Daffy Elmer Witch Lname Bunny Sam Duck Fudd Hazel State NY CA NY ME MA Zip 11217 95389 10013 04578 01970 Phone (718) 938-3235 (209) 375-6572 (212) 227-1810 (207) 882-7323 (978) 744-0991 Age 34 52 35 43 57 Sex M M M M F Active Y N N Y N Column-oriented is very flexible for adding columns, no need for a full rebuild required with it.
  • 16. Horizontal Partition: 8 Million Rows Extent 2 Horizontal Partition: 8 Million Rows Extent 3 Horizontal Partition: 8 Million Rows Extent 1 Storage Architecture reduces I/O • Only touch column files that are in projection, filter and join conditions • Eliminate disk block touches to partitions outside filter and join conditions Extent 1: Min State: CA, Max State: NY Extent 2: Min State: OR, Max State: WY Extent 3: Min State: IA, Max State: TN SELECT Fname FROM Table 1 WHERE State = ‘NY’ High Performance Query Processing ID 1 2 3 4 ... 8M 8M+1 ... 16M 16M+1 ... 24M Fname Bugs Yosemite Daffy Hazel ... ... Jane ... Elmer Lname Bunny Sam Duck Fudd ... ... ... State NY CA NY ME ... MN WY TX OR ... VA TN IA NY ... PA Zip 11217 95389 10013 04578 ... ... ... Phone (718) 938-3235 (209) 375-6572 (212) 227-1810 (207) 882-7323 ... ... ... Age 34 52 35 43 ... ... ... Sex M M M F ... ... ... Vertical Partition Vertical Partition Vertical Partition Vertical Partition Vertical Partition … ELIMINATED PARTITION
  • 17. Analytics – In-database distributed analytics with complex join, aggregation, window functions – Extensible UDF for custom analytics – Cross Engine Join with other storage engines Window functions – PARTITION BY / ORDER BY – Aggregate functions: MAX, MIN, COUNT, SUM, AVG STD, STDDEV_SAMP, STDDEV_POP, VAR_SAMP, VAR_POP – Ranking: ROW_NUMBER, RANK, DENSE_RANK, PERCENT_RANK CUME_DIST, NTILE, PERCENTILE, PERCENTILE_CONT PERCENTILE_DISC, MEDIAN Daily Running Average Revenue for each item SELECT item_id, server_date, daily_revenue, AVG(revenue) OVER (PARTITION BY item_id ORDER BY server_date RANGE INTERVAL '1' DAY PRECEDING ) running_avg FROM web_item_sales Item ID Server_date Revenue 1 02-01-2014 20,000.00 1 02-02-2014 5,001.00 2 02-01-2014 15,000.00 2 02-04-2014 34,029.00 2 02-05-2014 7,138.00 3 02-01-2014 17,250.00 3 02-03-2014 25,010.00 3 02-04-2014 21,034.00 3 02-05-2014 4,120.00 Running Average 20,000.00 12,500.50 15,000.00 34,209.00 20,583.50 17,250.00 250,100.00 12,577.00 20,583.50
  • 19. General • Not suited for OLTP, needs big data to process fast (millions of records) • Micro-batch load allows near real-time behaviour • Infrequently used columns do not impact other queries • Columnar suitable for sparse columns (nulls compress nicely) Query Modeling • Star-schema optimizations are generally a good idea • Conservative data typing is important – fixed-length vs. dictionary boundary (8 bytes) – IP Address vs. IP Number • Break down compound fields into individual fields – Trivializes searching for sub-fields – Can avoid dictionary overhead – Cost to re-assemble is generally small Best Practices
  • 20. Cpimport • Fastest way to load data from CSV file, standard input, binary source file • Multiple tables in can be loaded in parallel by launching multiple jobs • Read queries continue without being blocked • Successful cpimport is auto-committed • In case of errors, entire load is rolled back LOAD DATA INFILE • Traditional way of importing data into any MariaDB storage engine table • Up to 2 times slower than cpimport for large size imports • Either success or error operation can be rolled back Data Ingestion
  • 21. HA at UM node • When one UM node goes down, another UM node takes over HA at Data Storage • AWS EBS (Elastic Block Store) • GlusterFS - Multiple copy of data block across storage. If a disk on a PM node fails, another PM node will have access to the copy of the data High Availability HA at PM node • SAN/AWS EBS - When a PM node goes down, the data volumes attached to the failed PM node gets attached to another PM • Local Disks -If a PM node goes down, the data on its disks are not available, though queries continue on the remaining data set
  • 22. Where to find MariaDB ColumnStore? SOFTWARE DOWNLOAD https://mariadb.com/downloads/columnstore SOURCE https://github.com/mariadb-corporation/mariadb-columnstore-engine DOCUMENTATION https://mariadb.com/kb/en/mariadb/mariadb-columnstore/ BLOGS https://mariadb.com/blog-tags/columnstore </>