The document provides an overview of MariaDB ColumnStore, an open source column-oriented database engine for analytics and business intelligence (BI). It describes key differences between row-oriented and column-oriented storage, how ColumnStore provides high performance for analytical queries through its columnar storage and distributed architecture, and best practices for data ingestion and querying ColumnStore.
What makes a “good” service is a moving target. Technologies and requirements change over time. It can be impossible to ensure that none of your services have been left behind.
The Service ScoreCard approach is to have a small check for each service initiative we have, this could be anything measurable; deployment frequency, the oncall team all have phone; ensuring the latest version of the JVM.
The Service ScoreCard, gives each service a grade from 'F' to 'A+', based on passing or failing the list of checks. As soon as anyone see the service grade’s slipping everyone rallies to improve the grades.
We can then set up rules based on the grades, “Only B and above services can deploy 24 / 7”, “moratorium on services without an A+” or “No SRE support until the services below C grade”.
8 Practical Ways to Help Children Who Are Syrian Refugees: Part OneJohn Slifko, Ph.D
John Slifko teaches about how Syrian refugee children suffer because of the current civil war and how you can start and continue to ease their suffering.
What makes a “good” service is a moving target. Technologies and requirements change over time. It can be impossible to ensure that none of your services have been left behind.
The Service ScoreCard approach is to have a small check for each service initiative we have, this could be anything measurable; deployment frequency, the oncall team all have phone; ensuring the latest version of the JVM.
The Service ScoreCard, gives each service a grade from 'F' to 'A+', based on passing or failing the list of checks. As soon as anyone see the service grade’s slipping everyone rallies to improve the grades.
We can then set up rules based on the grades, “Only B and above services can deploy 24 / 7”, “moratorium on services without an A+” or “No SRE support until the services below C grade”.
8 Practical Ways to Help Children Who Are Syrian Refugees: Part OneJohn Slifko, Ph.D
John Slifko teaches about how Syrian refugee children suffer because of the current civil war and how you can start and continue to ease their suffering.
Amazon RDS allows you to launch an optimally configured, secure and highly available database with just a few clicks. It provides cost-efficient and resizable capacity while managing time-consuming database administration tasks, freeing you up to focus on your applications and business. We’ll discuss Amazon RDS fundamentals, learn about the six available database engines (with the seventh on the way), and examine customer success stories.
Rhinoplasty (more commonly referred to as a nose job) is a surgical procedure that reshapes or resizes the nose. For more information about Nose job read here
http://healthfirstmagazine.blogspot.com/2017/03/nose-job.html
Transactional and Analytics together: MariaDB and ColumnStoremlraviol
MariaDB ColumnStore extends MariaDB Server, a relational database for transaction processing, with distributed columnar storage and parallel query processing for scalable, high-performance analytical processing. This session helps to understand how MariaDB ColumnStore works and why it’s needed for more demanding analytical workloads.
What You Need To Know About The Top Database TrendsDell World
The last 5 years have seen transformative changes in both personal and enterprise technologies. Many of these changes have been driven by or are driving paradigm shifts in database technologies and information systems. These include trends such as engineered systems including Exadata, "Big Data" technologies such as Hadoop ,"NoSQL" databases, SSDs, in-memory and columnar technologies. In this presentation we’ll review these big trends and describe how they are changing the database landscape and influencing the career prospects for database professionals.
When your query execution is slow, a couple of questions arise. Where to look for resources utilization? What tools do you have to analyze CPU, hard drive and RAM bottlenecks? Could you do something to reduce query execution time? MariaDB's Patrick LeBlanc and Roman Nozdrin touch on both Columnstore's query execution introspection tools as well as operating system capabilities that everyone should know about. They go on to discuss a number of real life use cases too. Some called for configuration changes whilst others forced them to make serious changes in the code.
This is an introduction to Amazon Redshift and cover the essentials you need to deploy your data warehouse in the cloud so that you can achieve faster analytics and save costs.
Amazon RDS allows you to launch an optimally configured, secure and highly available database with just a few clicks. It provides cost-efficient and resizable capacity while managing time-consuming database administration tasks, freeing you up to focus on your applications and business. We’ll discuss Amazon RDS fundamentals, learn about the six available database engines (with the seventh on the way), and examine customer success stories.
Rhinoplasty (more commonly referred to as a nose job) is a surgical procedure that reshapes or resizes the nose. For more information about Nose job read here
http://healthfirstmagazine.blogspot.com/2017/03/nose-job.html
Transactional and Analytics together: MariaDB and ColumnStoremlraviol
MariaDB ColumnStore extends MariaDB Server, a relational database for transaction processing, with distributed columnar storage and parallel query processing for scalable, high-performance analytical processing. This session helps to understand how MariaDB ColumnStore works and why it’s needed for more demanding analytical workloads.
What You Need To Know About The Top Database TrendsDell World
The last 5 years have seen transformative changes in both personal and enterprise technologies. Many of these changes have been driven by or are driving paradigm shifts in database technologies and information systems. These include trends such as engineered systems including Exadata, "Big Data" technologies such as Hadoop ,"NoSQL" databases, SSDs, in-memory and columnar technologies. In this presentation we’ll review these big trends and describe how they are changing the database landscape and influencing the career prospects for database professionals.
When your query execution is slow, a couple of questions arise. Where to look for resources utilization? What tools do you have to analyze CPU, hard drive and RAM bottlenecks? Could you do something to reduce query execution time? MariaDB's Patrick LeBlanc and Roman Nozdrin touch on both Columnstore's query execution introspection tools as well as operating system capabilities that everyone should know about. They go on to discuss a number of real life use cases too. Some called for configuration changes whilst others forced them to make serious changes in the code.
This is an introduction to Amazon Redshift and cover the essentials you need to deploy your data warehouse in the cloud so that you can achieve faster analytics and save costs.
Functional Leap of Faith (Keynote at JDay Lviv 2014)Tomer Gabel
Keynote talk given at JDay Lviv 2014 in Ukraine (http://www.jday.com.ua/). Video coming soon.
Abstract:
Some say that there's nothing new under the sun. However, looking back on five to six decades of computing, it's easy to see that things progress at their own leisurly pace. Structured programming, originating in the '60s, did not gain mainstream adoption until the '80s; object-oriented programming was hotly debated in the '70s and '80s but only gained widespread acceptance in the '90s. Every couple of decades sees an engineering leap that radically improves the software engineering discipline across the board. I believe we are now at such an inflection point, with functional programming concepts slowly sifting into the mainstream. After this talk, I hope you will too.
Learn how Amazon Redshift, our fully managed, petabyte-scale data warehouse, can help you quickly and cost-effectively analyze all of your data using your existing business intelligence tools. Get an introduction to how Amazon Redshift uses massively parallel processing, scale-out architecture, and columnar direct-attached storage to minimize I/O time and maximize performance. Learn how you can gain deeper business insights and save money and time by migrating to Amazon Redshift. Take away strategies for migrating from on-premises data warehousing solutions, tuning schema and queries, and utilizing third party solutions.
AI, ML and Graph Algorithms: Real Life Use Cases with Neo4jIvan Zoratti
I gave this presentation at DataOps 19 in Barcelona.
You will find information about Neo4j and how to use it with Graph Algorithms for Machine Learning and Artificial Intelligence.
This is the presentation at the OSIsoft EMEA User Conference in London, 16 October 2017.
Please note that "Open Edge Module" and "FogLAMP" are synonyms.
Time Series From Collection To AnalysisIvan Zoratti
This is my talk at Percona Live 2016 in Santa Clara. It is a quick walkthrough time series workloads and solutions with traditional relational databases and dedicated time series DBs
This is the presentation at Percona Live 2015 on MySQL, MariaDB and Percona Orchestration on bare metal, virtualised environments and clouds (AWS and OpenStack).
These are the slides that I presented at Percona Live London, 4th Dec 2012.
There is lots of content related to the deployment and use of MySQL in the cloud, specifically in Amazon EC2.
These are the slides of my presentation at the NYC MySQL Meetup on Sep 21 2012. There are tips and tricks about MySQL in the cloud and the SkySQL cloud data suite
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
GridMate - End to end testing is a critical piece to ensure quality and avoid...ThomasParaiso2
End to end testing is a critical piece to ensure quality and avoid regressions. In this session, we share our journey building an E2E testing pipeline for GridMate components (LWC and Aura) using Cypress, JSForce, FakerJS…
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
3. OLTP
On-line Transaction Processing
• large number of short on-line
transactions (INSERT, UPDATE,
DELETE)
• very fast query processing, maintaining
data integrity in multi-access
environments
• effectiveness measured by number of
transactions per second
• operational (detailed and current) data
• OLTPs are the original data source
OLAP
On-line Analytical Processing
• characterized by low volume of
concurrent transactions
• complex queries, often involving
aggregations
• response time is effectiveness measure
• data is aggregated, historical and stored
in multi-dimensional schemas
• OLAP data comes from the various
OLTP Databases
Rows/DataSize Scope
1 100 10,000 1,000,000 100,000,000 10,000,000,000 100,000,000,000
10-100GB 100-1000GB 1-10TB 10-100TB...PB
MariaDB OLTP MariaDB ColumnStore OLAP
4. 1. Descriptive Analytics
What is Happening?
Traditional OLAP
2. Diagnostic Analytics
Why did it Happen?
3. Predictive Analytics
What is likely to happen?
4. Prescriptive Analytics
What should I do about it?
Big Data Analytics
Social Media
Sensors
Node 1
Biometrics
Mobile
Data Collection
MariaDB
ColumnStore
Data Processing
BI Tools, Data Science
Applications
Connectors,
SPARK Integration etc
MariaDB
MaxScale
Transactional,
Operational
Analytics
Insight
5. MariaDB ColumnStore Architecture
Columnar Distributed Data Storage
Local Storage | SAN | EBS | Gluster FS
BI Tool SQL Client Custom
Big Data
App
Application
MariaDB SQL
Front End
Distributed
Query Engine
Data
Storage
User Module (UM)
Performance
Module (PM)
6. User Modules
• mysqld - The MariaDB server
• ExeMgr - MariaDB’s interface to
ColumnStore
• cpimport - high-performance data
import
Query Processing - UM
• SQL Operations are translated into
thousands of Primitives
• Parallel/Distributed SQL
• Extensible with Parallel/Distributed
UDFs
• Query is parsed by mysqld on UM node
• Parsed query handed over to ExeMgr
on UM node
• ExecMgr breaks down the query in
primitive operations
MariaDB SQL
Front End
User Module (UM)
7. Performance Modules
• PrimProc - Primitives Processor
• WriteEngineServ - Database file writing
processor
• DMLProc - DML writes processor
• DDLProc - DDL processor
Query Processing - PM
• Primitives are processed on PM
• One thread working on a range of rows
• Typically 1/2 million rows, stored in a
few hundred blocks of data
• Execute all column operations required
(restriction and projection)
• Execute any group by/aggregation
against local data
• Each primitive executes in a fraction of
a second
• Primitives are run in parallel and fully
distributed
Distributed
Query Engine
Performance
Module (PM)
8. Storage Architecture
• Columnar storage
– Each column stored as separate file
– No index management for query
performance tuning
– Online Schema changes: Add new column
without impacting running queries
• Automatic horizontal partitioning
– Logical partition every 8 Million rows
– In memory metadata of partition min and max
– No partition management for query
performance tuning
• Compression
– Accelerate decompression rate
– Reduce I/O for compressed blocks
Column 1
Extent 1 (8 million rows, 8MB~64MB)
Extent 2 (8 million rows)
Extent M (8 million rows)
Column 2 Column 3 ... Column N
Data automatically arranged by
• Column – Acts as Vertical Partitioning
• Extents – Acts as horizontal partition
Vertical
Partition
Horizontal
Partition
...
Vertical
Partition
Vertical
Partition
Vertical
Partition
Horizontal
Partition
Horizontal
Partition
9. High Performance Data Ingestion
• Fully parallel high
speed data load
– Parallel data loads on all PMs simultaneously
– Multiple tables in can be loaded simultaneously
– Read queries continue without being blocked
• Micro-batch loading
for real-time data flow
Column 1
Extent 1 (8 million rows, 8MB~64MB)
Extent 2 (8 million rows)
Extent M (8 million rows)
Column 2 ... Column N
Horizontal
Partition
...
Horizontal
Partition
Horizontal
Partition
High Water Mark
New Data being loaded
Dataaccessedby
runningqueries
11. Row oriented:
rows stored
sequentially in a file.
Column oriented:
Each column is stored
in a separate file. Each
column for a given
row is at the same
offset.
Row-oriented vs. Column-oriented format
Key Fname Lname State Zip Phone Age Sex
1 Bugs Bunny NY 11217 (718) 938-3235 34 M
2 Yosemite Sam CA 95389 (209) 375-6572 52 M
3 Daffy Duck NY 10013 (212) 227-1810 35 M
4 Elmer Fudd ME 04578 (207) 882-7323 43 M
5 Witch Hazel MA 01970 (978) 744-0991 57 F
Key
1
2
3
4
5
Fname
Bugs
Yosemite
Daffy
Elmer
Witch
Lname
Bunny
Sam
Duck
Fudd
Hazel
State
NY
CA
NY
ME
MA
Zip
11217
95389
10013
04578
01970
Phone
(718) 938-3235
(209) 375-6572
(212) 227-1810
(207) 882-7323
(978) 744-0991
Age
34
52
35
43
57
Sex
M
M
M
M
F
12. Row oriented:
new rows appended to
the end.
Column oriented:
new value added to
each file.
Single-Row Operations - Insert
Key Fname Lname State Zip Phone Age Sex
1 Bugs Bunny NY 11217 (718) 938-3235 34 M
2 Yosemite Sam CA 95389 (209) 375-6572 52 M
3 Daffy Duck NY 10013 (212) 227-1810 35 M
4 Elmer Fudd ME 04578 (207) 882-7323 43 M
5 Witch Hazel MA 01970 (978) 744-0991 57 F
6 Marvin Martian CA 91602 (818) 761-9964 26 M
Key
1
2
3
4
5
Fname
Bugs
Yosemite
Daffy
Elmer
Witch
Lname
Bunny
Sam
Duck
Fudd
Hazel
State
NY
CA
NY
ME
MA
Zip
11217
95389
10013
04578
01970
Phone
(718) 938-3235
(209) 375-6572
(212) 227-1810
(207) 882-7323
(978) 744-0991
Age
34
52
35
43
57
Sex
M
M
M
M
F
6 Marvin Martian CA 91602 (818) 761-9964 26 M
Columnar insert not efficient for singleton insertions (OLTP). Batch loads touches
row vs. column. Batch load on column-oriented is faster (compression, no indexes).
13. Row oriented:
new rows deleted
Column oriented:
value deleted from
each file
Single-Row Operations - Delete
Key Fname Lname State Zip Phone Age Sex
1 Bugs Bunny NY 11217 (718) 938-3235 34 M
2 Yosemite Sam CA 95389 (209) 375-6572 52 M
3 Daffy Duck NY 10013 (212) 227-1810 35 M
4 Elmer Fudd ME 04578 (207) 882-7323 43 M
5 Witch Hazel MA 01970 (978) 744-0991 57 F
6 Marvin Martian CA 91602 (818) 761-9964 26 M
Key
1
2
3
4
5
Fname
Bugs
Yosemite
Daffy
Elmer
Witch
Lname
Bunny
Sam
Duck
Fudd
Hazel
State
NY
CA
NY
ME
MA
Zip
11217
95389
10013
04578
01970
Phone
(718) 938-3235
(209) 375-6572
(212) 227-1810
(207) 882-7323
(978) 744-0991
Age
34
52
35
43
57
Sex
M
M
M
M
F
6 Marvin Martian CA 91602 (818) 761-9964 26 M
Recommended Partition Drop to allow dropping columns in bulk.
14. Row oriented:
Update 100% of rows
means change 100%
of blocks on disk.
Column oriented:
Just update the blocks
needed to be updated
Single-Row Operations - Update
Key Fname Lname State Zip Phone Age Sex
1 Bugs Bunny NY 11217 (718) 938-3235 34 M
2 Yosemite Sam CA 95389 (209) 375-6572 52 M
3 Daffy Duck NY 10013 (212) 227-1810 35 M
4 Elmer Fudd ME 04578 (207) 882-7323 43 M
5 Witch Hazel MA 01970 (978) 744-0991 57 F
Key
1
2
3
4
5
Fname
Bugs
Yosemite
Daffy
Elmer
Witch
Lname
Bunny
Sam
Duck
Fudd
Hazel
State
NY
CA
NY
ME
MA
Zip
11217
95389
10013
04578
01970
Phone
(718) 938-3235
(209) 375-6572
(212) 227-1810
(207) 882-7323
(978) 744-0991
Age
34
52
35
43
57
Sex
M
M
M
M
F
15. Row oriented:
requires rebuilding of
the whole table
Column oriented:
Create new file for the
new column
Changing the table structure
Key Fname Lname State Zip Phone Age Sex Active
1 Bugs Bunny NY 11217 (718) 938-3235 34 M Y
2 Yosemite Sam CA 95389 (209) 375-6572 52 M N
3 Daffy Duck NY 10013 (212) 227-1810 35 M N
4 Elmer Fudd ME 04578 (207) 882-7323 43 M Y
5 Witch Hazel MA 01970 (978) 744-0991 57 F N
Key
1
2
3
4
5
Fname
Bugs
Yosemite
Daffy
Elmer
Witch
Lname
Bunny
Sam
Duck
Fudd
Hazel
State
NY
CA
NY
ME
MA
Zip
11217
95389
10013
04578
01970
Phone
(718) 938-3235
(209) 375-6572
(212) 227-1810
(207) 882-7323
(978) 744-0991
Age
34
52
35
43
57
Sex
M
M
M
M
F
Active
Y
N
N
Y
N
Column-oriented is very flexible for adding columns, no need for a full rebuild
required with it.
16. Horizontal
Partition:
8 Million Rows
Extent 2
Horizontal
Partition:
8 Million Rows
Extent 3
Horizontal
Partition:
8 Million Rows
Extent 1
Storage Architecture reduces I/O
• Only touch column files
that are in projection, filter
and join conditions
• Eliminate disk block touches
to partitions outside filter
and join conditions
Extent 1:
Min State: CA, Max State: NY
Extent 2:
Min State: OR, Max State: WY
Extent 3:
Min State: IA, Max State: TN
SELECT Fname FROM Table 1 WHERE State = ‘NY’
High Performance Query Processing
ID
1
2
3
4
...
8M
8M+1
...
16M
16M+1
...
24M
Fname
Bugs
Yosemite
Daffy
Hazel
...
...
Jane
...
Elmer
Lname
Bunny
Sam
Duck
Fudd
...
...
...
State
NY
CA
NY
ME
...
MN
WY
TX
OR
...
VA
TN
IA
NY
...
PA
Zip
11217
95389
10013
04578
...
...
...
Phone
(718) 938-3235
(209) 375-6572
(212) 227-1810
(207) 882-7323
...
...
...
Age
34
52
35
43
...
...
...
Sex
M
M
M
F
...
...
...
Vertical
Partition
Vertical
Partition
Vertical
Partition
Vertical
Partition
Vertical
Partition
…
ELIMINATED PARTITION
17. Analytics
– In-database distributed analytics with complex
join, aggregation, window functions
– Extensible UDF for custom analytics
– Cross Engine Join with other storage engines
Window functions
– PARTITION BY / ORDER BY
– Aggregate functions: MAX, MIN, COUNT,
SUM, AVG STD, STDDEV_SAMP,
STDDEV_POP, VAR_SAMP, VAR_POP
– Ranking: ROW_NUMBER, RANK,
DENSE_RANK, PERCENT_RANK
CUME_DIST, NTILE, PERCENTILE,
PERCENTILE_CONT
PERCENTILE_DISC, MEDIAN
Daily Running Average Revenue for each item
SELECT item_id, server_date, daily_revenue,
AVG(revenue) OVER
(PARTITION BY item_id ORDER BY server_date
RANGE INTERVAL '1' DAY PRECEDING ) running_avg
FROM web_item_sales
Item ID Server_date Revenue
1 02-01-2014 20,000.00
1 02-02-2014 5,001.00
2 02-01-2014 15,000.00
2 02-04-2014 34,029.00
2 02-05-2014 7,138.00
3 02-01-2014 17,250.00
3 02-03-2014 25,010.00
3 02-04-2014 21,034.00
3 02-05-2014 4,120.00
Running Average
20,000.00
12,500.50
15,000.00
34,209.00
20,583.50
17,250.00
250,100.00
12,577.00
20,583.50
19. General
• Not suited for OLTP, needs big data to
process fast (millions of records)
• Micro-batch load allows near real-time
behaviour
• Infrequently used columns do not
impact other queries
• Columnar suitable for sparse columns
(nulls compress nicely)
Query Modeling
• Star-schema optimizations are
generally a good idea
• Conservative data typing is important
– fixed-length vs. dictionary boundary (8
bytes)
– IP Address vs. IP Number
• Break down compound fields into
individual fields
– Trivializes searching for sub-fields
– Can avoid dictionary overhead
– Cost to re-assemble is generally small
Best Practices
20. Cpimport
• Fastest way to load data from CSV file,
standard input, binary source file
• Multiple tables in can be loaded in
parallel by launching multiple jobs
• Read queries continue without being
blocked
• Successful cpimport is auto-committed
• In case of errors, entire load is rolled
back
LOAD DATA INFILE
• Traditional way of importing data into
any MariaDB storage engine table
• Up to 2 times slower than cpimport for
large size imports
• Either success or error operation can be
rolled back
Data Ingestion
21. HA at UM node
• When one UM node goes down, another
UM node takes over
HA at Data Storage
• AWS EBS (Elastic Block Store)
• GlusterFS - Multiple copy of data block
across storage. If a disk on a PM node
fails, another PM node will have access
to the copy of the data
High Availability
HA at PM node
• SAN/AWS EBS - When a PM node
goes down, the data volumes
attached to the failed PM node gets
attached to another PM
• Local Disks -If a PM node goes down,
the data on its disks are not available,
though queries continue on the
remaining data set
22. Where to find MariaDB ColumnStore?
SOFTWARE DOWNLOAD https://mariadb.com/downloads/columnstore
SOURCE https://github.com/mariadb-corporation/mariadb-columnstore-engine
DOCUMENTATION https://mariadb.com/kb/en/mariadb/mariadb-columnstore/
BLOGS https://mariadb.com/blog-tags/columnstore
</>