SlideShare a Scribd company logo
Query Optimization in
Apache Tajo
Jihoon Son / Gruter inc.
About Me
● Jihoon Son (@jihoonson)
○ Tajo project co-founder
○ Committer and PMC member of Apache Tajo
○ Research engineer at Gruter
2
● Introduction to Tajo
● Query processing in Tajo
○ Query plans in Tajo
○ Query processing example
● Query optimization in Tajo
○ Introduction to query optimization
○ Query optimization techniques in Tajo
Outline
3
● Apache Top-level Project
○ Data warehouse system
■ Efficient processing of analytic queries
■ ANSI-SQL compliant
○ Scalable and rapid query execution with own engine
■ Distributed query processing
■ Fault-tolerance
○ Beyond SQL-on-Hadoop
■ Support various types of storage
● HDFS, S3, hbase, rdbms, ...
What is Tajo?
4
Highlighted Features
● Support long-running batch queries as well as
interactive ad-hoc queries
○ Fast query processing
■ Optimized scan performance
● 120 MB/sec per physical disk (SATA)
○ Reliability
■ Fault tolerance
■ No single point of failure with HA support
5
Highlighted Features
● Support of various kinds of data sources
○ HDFS, Amazon S3, Google Cloud Storage, HBase,
RDBMS, ...
● Mature SQL support
○ Various kinds of join support
○ Window function support
○ Cost-based query optimization
● Integration with other systems
○ Notebooks like Zeppelin
○ BI tools
6
Recent Release: 0.11
● Feature highlights
○ Query federation
○ JDBC-based storage support
○ Self-describing data formats support
○ Multi-query support
○ More stable and efficient join execution
○ Index support
○ Python UDF/UDAF support
7
Tajo Master
Catalog Server
Tajo Master
Catalog Server
Architecture Overview
DBMS
HCatalog
Tajo Master
Catalog Server
Tajo Worker
Query Master
Query Executor
Storage Service
Tajo Worker
Query Master
Query Executor
Storage Service
Tajo Worker
Query Master
Query Executor
Storage Service
JDBC client
TSQLWebUI
REST API
Storage
Submit
a query
Manage
metadataAllocate
a query
Send tasks
& monitor
Send tasks
& monitor
8
Tajo Worker
Query Master
Tajo Worker
Query Master
Tajo Worker
Query Master
Query Execution Steps
9
Tajo Master
Catalog Server
Tajo Client
① Submit a
query
DBMS
② Assign a
query
● Initializing a query execution
③ Build a query
execution plan
Tajo Worker
Query Executor
Storage Service
Tajo Worker
Query Master
Query Executor
Storage Service
Tajo Worker
Query Executor
Storage Service
Query Execution Steps
10
Storage
⑥ Send status
and progress
⑤ Read and
process data
④ Send tasks
& monitor
● Executing a query
Tajo Master
Tajo Worker
Query Executor
Storage Service
Tajo Worker
Query Master
Query Executor
Storage Service
Tajo Worker
Query Executor
Storage Service
Query Execution Steps
11
Tajo Client
Storage
⑧ Notify that query
execution is completed
⑦ Store the result
on storage
⑨ Send the
result location
⑩ Read the
result
● Finalizing the query execution
Tajo Master
Query Processing in Tajo
12
● Given a user query, a query execution plan is an
ordered set of steps to execute the query
○ Example
■ Read data from storage, and then do join on some join
keys, and finally aggregate with some aggregation keys
● In Tajo, there are three kinds of query plans
○ Query master generates a logical query plan and a
distributed query plan
○ Query executor of tajo workers generates a local query
plan
Query Execution Plan
13
Query Planning Steps in Tajo
14
SQL
SQL
Analyzer
Algebraic
Expression
Logical
Planner
Logical Query
Plan
Global
Planner
Distributed
Query Plan
Physical
Planner
Local Query
Plan
Query Executor
Query Master
Distributed to
tajo workers
Join
Logical Query Plan
● A tree of relational algebras
● Example
15
SELECT
item.brand,
sum(price)
FROM
sales,
item
WHERE
sales.item_key =
item.item_key
GROUP BY
item.brand,
Scan on
item
Scan on
sales
Group by
< SQL > < Logical query plan >
key: item_key
key: brand
func: sum(price)
Distributed Query Plan
● A plan with additional annotations for distributed
execution
○ Data exchange (shuffle) keys, methods, ...
16
< Distributed query plan >
Join
Scan on
item
Scan on
sales
Group by
< Logical query plan >
key: item_key
key: brand
func: sum(price)
Join
Scan on
item
Scan on
sales
Group by
key: item_key
key: brand
func: sum(price)
Hash shuffle with
item_key
Hash shuffle with
item_key
Range shuffle
with brand
Local Query Plan
● A plan with additional annotations for local execution
○ In-memory algorithm, disk-based algorithm, …
17
< Distributed query plan >
Join
Scan on
item
Scan on
sales
Group by
key: item_key
key: brand
func: sum(price)
Hash shuffle with
item_key
Hash shuffle with
item_key
Range shuffle
with brand
< Local query plan >
Join
Scan on
item
Scan on
sales
Group by
key: item_key
key: brand
func: sum(price)
Hash shuffle with
item_key
Hash shuffle with
item_key
Range shuffle
with brandSort-merge
join
Hash
aggregation
Query Processing in Tajo
● A query is executed by executing multiple stages
subsequently
○ A stage is a minimum unit to execute at least a single
operator
● Each stage is processed by multiple query executors of
tajo worker in parallel
18
Join
Scan on
item
Scan on
sales
key: item_key
Stage 2
Stage 1
● SQL ● Logical query plan
Query Processing Example
19
Join
SELECT
item.brand,
sum(price)
FROM
sales,
item
WHERE
sales.item_key =
item.item_key
GROUP BY
item.brand,
Scan on
item
Scan on
sales
Group by
key: item_key
key: brand
func: sum(price)
● Logical query plan ● Distributed query plan
Query Processing Example
20
Join
Scan on
item
Scan on
sales
Group by
key: item_key
key: brand
func: sum(price)
Join
Scan on
item
Scan on
sales
Group by
key: item_key
key: brand
func: sum(price)
Stage 3
Stage 2
Stage 1
Hash shuffle
with item_key
Range shuffle
with brand
Hash shuffle
with item_key
Query Processing Example
● Distributed query plan
21
Join
Scan on
item
Scan on
sales
Group by
key: item_key
key: brand
func: sum(price)
Stage 3
Stage 2
Stage 1
Hash shuffle
with item_key
Range shuffle
with brand
Hash shuffle
with item_key
item item sales sales sales
Worker
Scan
Worker
Scan
Worker
Scan
Worker
Scan
Worker
Scan
● Distributed processing
Query Processing Example
22
Join
Scan on
item
Scan on
sales
Group by
key: item_key
key: brand
func: sum(price)
Stage 3
Stage 2
Stage 1
Hash shuffle
with item_key
Range shuffle
with brand
Hash shuffle
with item_key
item item sales sales sales
Worker
Scan
Worker
Scan
Worker
Scan
Worker
Scan
Worker
Scan
Worker
Join
Worker
Join
Worker
Join
Worker
Join
Worker
Join
shuffle
● Distributed query plan ● Distributed processing
Query Processing Example
● Distributed query plan
23
Join
Scan on
item
Scan on
sales
Group by
key: item_key
key: brand
func: sum(price)
Stage 3
Stage 2
Stage 1
Hash shuffle
with item_key
Range shuffle
with brand
Hash shuffle
with item_key
item item sales sales sales
Worker
Scan
Worker
Scan
Worker
Scan
Worker
Scan
Worker
Scan
Worker
Join
Worker
Join
Worker
Join
Worker
Join
Worker
Join
Worker
Group by
Worker
Group by
Worker
Group by
Worker
Group by
Worker
Group by
shuffle
shuffle
● Distributed processing
Query Optimization in Tajo
24
Query Optimization
● Mostly, user queries are not optimized for
performance
● The query optimizer attempts to determine the most
efficient way to execute a user query
○ Considering the possible query plans, and choosing the
best one
25
Extreme Example
● Query
○ select * from t where name like 'tajo%' order by id;
● Possible plans
26
Scan
Sort
Filter
Scan with
Filter
Sort● Naive plan
○ Filtering out tuples
after sort
○ Large cost for sort
● Better plan
○ Filtering out tuples
after scan immediately
○ Small cost for sort
○ Reduced number of
operations
Two Kinds of Query Optimization
● Rule-based optimization
○ A set of predefined rules is used to choose a good plan
○ Usually, heuristic approaches are used
■ Ex) filters should be pushed down to the lower part of the
query plan as much as possible
● Cost-based optimization
○ Enumerating possible query plans and choosing the one
having the lowest cost
○ Cost function has an important role
● Tajo utilizes both types of optimization
27
Query Optimization in Tajo
● Difference from traditional query optimization
○ Unlike traditional database systems, pre-collected
statistics is not so important
■ Data may be added or updated by several systems
including Flume, Kafka, Tajo, …
■ Pre-collected statistics can be useful, but is not fully
trustworthy
○ It is important to optimize query plans with minimal
statistics
■ Volume of input relations
28
Query Optimization in Tajo
● Tajo has two different approaches for query
optimization
○ Static optimization
■ Traditional approach
■ Optimizing the plan during the query planning phase
○ Progressive optimization
■ Optimizing the plan based on the intermediate statistics
while executing the query
● A query plan can be optimized without pre-collected
statistics
● Especially effective for queries which require multiple stage
execution 29
Logical Query Plan Optimization
● Rule-based optimization
○ Access path rewrite rule
■ Choosing access path to data
■ Index scan has the highest priority if available
○ Distributivity rule
■ Reducing filters based on distributivity
○ Filter pushdown rule
■ Pushing down filters to the lowest part as much as
possible
○ In-subquery rewrite rule
■ Transforming subqueries in 'IN' filters to semi(anti) joins
30
Logical Query Plan Optimization
● Rule-based optimization (cont')
○ Projection pushdown rule
■ Pushing down projections to the lowest part as much as
possible
● Cost-based optimization
○ Join order optimization
■ Finding a join order of lowest cost
■ Greedy heuristic: ordering relations from small ones to
large ones
● Very effective in single computing environment
● Need to improve for parallel computing environment
31
Distributed Query Plan Optimization
● Rule-based optimization
○ Two-phase execution of operators
■ Operators which require data shuffling like aggregation,
join, or sort are executed in two-phase
■ First phase is for local computing to reduce the amount of
shuffled data
■ Second phase is to get the result of the operation
32
Two-phase Execution Example
● Logical query plan
33
● Distributed query plan
Group by
Scan
Sort
Group by
Scan
SortStage 3
Stage 2
Stage 1
Group by
Sort
Local
group by
Local
sort
Distributed Query Plan Optimization
● Distributed join algorithm selection
○ Two representative distributed join algorithms
■ Join cannot be performed within a single stage in
distributed systems
● Tuples of the same join key may be distributed over cluster
nodes
■ Repartition join
● Both input relations are shuffled with the join key columns
■ Broadcast join
● Small relations are broadcasted to every node before join
34
Example of Repartition Join
● select … from employee e, department d where e.DeptName = d.
DeptName
35
Example of Broadcast Join
● select … from employee e, department d where e.DeptName = d.
DeptName
36
Distributed Join Algorithm Selection
● Repartition join VS broadcast join
○ Given a set of joins, some parts can be executed with
broadcast join while remaining parts are executed with
repartition join
● Which parts will be executed with broadcast join?
○ Greedy heuristic: broadcast join is used as many as
possible
■ The size of input relation should be smaller than pre-
defined threshold
■ The total volume of broadcasted relations should not
exceed pre-defined threshold 37
Distributed Join Algorithm Selection Example
● select … from lineitem, nation, region …
38
Local Query Plan Optimization
● Selecting the best algorithm based on the current
resource status
○ Aggregation
■ Hash aggregation, sort aggregation
○ Join
■ Hash join, sort-merge join
● For sort, hash sort is basically used with spilling data to
disk when it doesn't fit into memory
39
Progressive Optimization
● Data repartition
○ Some operators like join or aggregation require to
shuffle data with keys
○ The number of result partitions of shuffle should be
carefully decided
■ The number of partitions is related to the number of tasks
of the next stage
● At the beginning of each stage, the number of
partitions is decided based on the input size
40
Progressive Optimization Example
41
Group by
Scan on item
(100GB)
SortStage 3
Stage 2
Stage 1
Group by
Sort
# of partitions: 100
● If the default task size is 1GB,
Group by
Scan on item
SortStage 3
Stage 2
Stage 1
Group by
(50GB)
Sort
# of partitions: 50
# of tasks: 100
# of tasks: 50
Future Work
● Adding more optimization methods
● Improve cost functions for more effective cost-based
optimization
● Adding new approaches for progressive optimization
○ Runtime query rewriting
○ Integrating with genetic algorithm
○ …
42
43
Get Involved!
● General
○ http://tajo.apache.org
● Getting Started
○ http://tajo.apache.org/docs/current/getting_started.html
● Downloads
○ http://tajo.apache.org/downloads.html
● Jira – Issue Tracker
○ https://issues.apache.org/jira/browse/TAJO
● Join the mailing list
○ dev-subscribe@tajo.apache.org
○ issues-subscribe@tajo.apache.org
44
Thanks!

More Related Content

What's hot

IOT with PostgreSQL
IOT with PostgreSQLIOT with PostgreSQL
IOT with PostgreSQL
EDB
 
The evolution of Netflix's S3 data warehouse (Strata NY 2018)
The evolution of Netflix's S3 data warehouse (Strata NY 2018)The evolution of Netflix's S3 data warehouse (Strata NY 2018)
The evolution of Netflix's S3 data warehouse (Strata NY 2018)
Ryan Blue
 
Update on OpenTSDB and AsyncHBase
Update on OpenTSDB and AsyncHBase Update on OpenTSDB and AsyncHBase
Update on OpenTSDB and AsyncHBase
HBaseCon
 
Influxdb and time series data
Influxdb and time series dataInfluxdb and time series data
Influxdb and time series data
Marcin Szepczyński
 
OpenTSDB: HBaseCon2017
OpenTSDB: HBaseCon2017OpenTSDB: HBaseCon2017
OpenTSDB: HBaseCon2017
HBaseCon
 
Enabling Presto Caching at Uber with Alluxio
Enabling Presto Caching at Uber with AlluxioEnabling Presto Caching at Uber with Alluxio
Enabling Presto Caching at Uber with Alluxio
Alluxio, Inc.
 
Data Structures and Performance for Scientific Computing with Hadoop and Dumb...
Data Structures and Performance for Scientific Computing with Hadoop and Dumb...Data Structures and Performance for Scientific Computing with Hadoop and Dumb...
Data Structures and Performance for Scientific Computing with Hadoop and Dumb...
Austin Benson
 
Efficient Query Processing in Web Search Engines
Efficient Query Processing in Web Search EnginesEfficient Query Processing in Web Search Engines
Efficient Query Processing in Web Search Engines
Simon Lia-Jonassen
 
openTSDB - Metrics for a distributed world
openTSDB - Metrics for a distributed worldopenTSDB - Metrics for a distributed world
openTSDB - Metrics for a distributed world
Oliver Hankeln
 
HDF5 I/O Performance
HDF5 I/O PerformanceHDF5 I/O Performance
OpenTSDB for monitoring @ Criteo
OpenTSDB for monitoring @ CriteoOpenTSDB for monitoring @ Criteo
OpenTSDB for monitoring @ Criteo
Nathaniel Braun
 
RubiX
RubiXRubiX
Apache Solr as a compressed, scalable, and high performance time series database
Apache Solr as a compressed, scalable, and high performance time series databaseApache Solr as a compressed, scalable, and high performance time series database
Apache Solr as a compressed, scalable, and high performance time series database
Florian Lautenschlager
 
Skillwise - Enhancing dotnet app
Skillwise - Enhancing dotnet appSkillwise - Enhancing dotnet app
Skillwise - Enhancing dotnet app
Skillwise Group
 
Efficient Data Storage for Analytics with Parquet 2.0 - Hadoop Summit 2014
Efficient Data Storage for Analytics with Parquet 2.0 - Hadoop Summit 2014Efficient Data Storage for Analytics with Parquet 2.0 - Hadoop Summit 2014
Efficient Data Storage for Analytics with Parquet 2.0 - Hadoop Summit 2014
Julien Le Dem
 
Fabian Hueske – Juggling with Bits and Bytes
Fabian Hueske – Juggling with Bits and BytesFabian Hueske – Juggling with Bits and Bytes
Fabian Hueske – Juggling with Bits and Bytes
Flink Forward
 
Argus Production Monitoring at Salesforce
Argus Production Monitoring at SalesforceArgus Production Monitoring at Salesforce
Argus Production Monitoring at Salesforce
HBaseCon
 
Advanced MySql Data-at-Rest Encryption in Percona Server
Advanced MySql Data-at-Rest Encryption in Percona ServerAdvanced MySql Data-at-Rest Encryption in Percona Server
Advanced MySql Data-at-Rest Encryption in Percona Server
Severalnines
 
OpenTSDB 2.0
OpenTSDB 2.0OpenTSDB 2.0
OpenTSDB 2.0
HBaseCon
 

What's hot (19)

IOT with PostgreSQL
IOT with PostgreSQLIOT with PostgreSQL
IOT with PostgreSQL
 
The evolution of Netflix's S3 data warehouse (Strata NY 2018)
The evolution of Netflix's S3 data warehouse (Strata NY 2018)The evolution of Netflix's S3 data warehouse (Strata NY 2018)
The evolution of Netflix's S3 data warehouse (Strata NY 2018)
 
Update on OpenTSDB and AsyncHBase
Update on OpenTSDB and AsyncHBase Update on OpenTSDB and AsyncHBase
Update on OpenTSDB and AsyncHBase
 
Influxdb and time series data
Influxdb and time series dataInfluxdb and time series data
Influxdb and time series data
 
OpenTSDB: HBaseCon2017
OpenTSDB: HBaseCon2017OpenTSDB: HBaseCon2017
OpenTSDB: HBaseCon2017
 
Enabling Presto Caching at Uber with Alluxio
Enabling Presto Caching at Uber with AlluxioEnabling Presto Caching at Uber with Alluxio
Enabling Presto Caching at Uber with Alluxio
 
Data Structures and Performance for Scientific Computing with Hadoop and Dumb...
Data Structures and Performance for Scientific Computing with Hadoop and Dumb...Data Structures and Performance for Scientific Computing with Hadoop and Dumb...
Data Structures and Performance for Scientific Computing with Hadoop and Dumb...
 
Efficient Query Processing in Web Search Engines
Efficient Query Processing in Web Search EnginesEfficient Query Processing in Web Search Engines
Efficient Query Processing in Web Search Engines
 
openTSDB - Metrics for a distributed world
openTSDB - Metrics for a distributed worldopenTSDB - Metrics for a distributed world
openTSDB - Metrics for a distributed world
 
HDF5 I/O Performance
HDF5 I/O PerformanceHDF5 I/O Performance
HDF5 I/O Performance
 
OpenTSDB for monitoring @ Criteo
OpenTSDB for monitoring @ CriteoOpenTSDB for monitoring @ Criteo
OpenTSDB for monitoring @ Criteo
 
RubiX
RubiXRubiX
RubiX
 
Apache Solr as a compressed, scalable, and high performance time series database
Apache Solr as a compressed, scalable, and high performance time series databaseApache Solr as a compressed, scalable, and high performance time series database
Apache Solr as a compressed, scalable, and high performance time series database
 
Skillwise - Enhancing dotnet app
Skillwise - Enhancing dotnet appSkillwise - Enhancing dotnet app
Skillwise - Enhancing dotnet app
 
Efficient Data Storage for Analytics with Parquet 2.0 - Hadoop Summit 2014
Efficient Data Storage for Analytics with Parquet 2.0 - Hadoop Summit 2014Efficient Data Storage for Analytics with Parquet 2.0 - Hadoop Summit 2014
Efficient Data Storage for Analytics with Parquet 2.0 - Hadoop Summit 2014
 
Fabian Hueske – Juggling with Bits and Bytes
Fabian Hueske – Juggling with Bits and BytesFabian Hueske – Juggling with Bits and Bytes
Fabian Hueske – Juggling with Bits and Bytes
 
Argus Production Monitoring at Salesforce
Argus Production Monitoring at SalesforceArgus Production Monitoring at Salesforce
Argus Production Monitoring at Salesforce
 
Advanced MySql Data-at-Rest Encryption in Percona Server
Advanced MySql Data-at-Rest Encryption in Percona ServerAdvanced MySql Data-at-Rest Encryption in Percona Server
Advanced MySql Data-at-Rest Encryption in Percona Server
 
OpenTSDB 2.0
OpenTSDB 2.0OpenTSDB 2.0
OpenTSDB 2.0
 

Viewers also liked

WSO2 Product Release Webinar - Introducing the WSO2 Complex Event Processor
WSO2 Product Release Webinar - Introducing the WSO2 Complex Event Processor WSO2 Product Release Webinar - Introducing the WSO2 Complex Event Processor
WSO2 Product Release Webinar - Introducing the WSO2 Complex Event Processor
WSO2
 
Intro to Distributed Database Management System
Intro to Distributed Database Management SystemIntro to Distributed Database Management System
Intro to Distributed Database Management System
Ali Raza
 
MySQL Query Optimization (Basics)
MySQL Query Optimization (Basics)MySQL Query Optimization (Basics)
MySQL Query Optimization (Basics)
Karthik .P.R
 
Data Scientist Workbench 入門
Data Scientist Workbench 入門Data Scientist Workbench 入門
Data Scientist Workbench 入門
soh kaijima
 
RDF Refineの使い方
RDF Refineの使い方RDF Refineの使い方
Siddhi: A Second Look at Complex Event Processing Implementations
Siddhi: A Second Look at Complex Event Processing ImplementationsSiddhi: A Second Look at Complex Event Processing Implementations
Siddhi: A Second Look at Complex Event Processing Implementations
Srinath Perera
 
Linux BPF Superpowers
Linux BPF SuperpowersLinux BPF Superpowers
Linux BPF Superpowers
Brendan Gregg
 

Viewers also liked (7)

WSO2 Product Release Webinar - Introducing the WSO2 Complex Event Processor
WSO2 Product Release Webinar - Introducing the WSO2 Complex Event Processor WSO2 Product Release Webinar - Introducing the WSO2 Complex Event Processor
WSO2 Product Release Webinar - Introducing the WSO2 Complex Event Processor
 
Intro to Distributed Database Management System
Intro to Distributed Database Management SystemIntro to Distributed Database Management System
Intro to Distributed Database Management System
 
MySQL Query Optimization (Basics)
MySQL Query Optimization (Basics)MySQL Query Optimization (Basics)
MySQL Query Optimization (Basics)
 
Data Scientist Workbench 入門
Data Scientist Workbench 入門Data Scientist Workbench 入門
Data Scientist Workbench 入門
 
RDF Refineの使い方
RDF Refineの使い方RDF Refineの使い方
RDF Refineの使い方
 
Siddhi: A Second Look at Complex Event Processing Implementations
Siddhi: A Second Look at Complex Event Processing ImplementationsSiddhi: A Second Look at Complex Event Processing Implementations
Siddhi: A Second Look at Complex Event Processing Implementations
 
Linux BPF Superpowers
Linux BPF SuperpowersLinux BPF Superpowers
Linux BPF Superpowers
 

Similar to Query optimization in Apache Tajo

Data Enginering from Google Data Warehouse
Data Enginering from Google Data WarehouseData Enginering from Google Data Warehouse
Data Enginering from Google Data Warehouse
arungansi
 
Denodo Data Virtualization Platform Architecture: Performance (session 2 from...
Denodo Data Virtualization Platform Architecture: Performance (session 2 from...Denodo Data Virtualization Platform Architecture: Performance (session 2 from...
Denodo Data Virtualization Platform Architecture: Performance (session 2 from...
Denodo
 
7.4 Admin Tools and Best Practices
7.4 Admin Tools and Best Practices7.4 Admin Tools and Best Practices
7.4 Admin Tools and Best Practices
TargetX
 
Final Presentation.pptx
Final Presentation.pptxFinal Presentation.pptx
Final Presentation.pptx
MarkBauer47
 
Counting Unique Users in Real-Time: Here's a Challenge for You!
Counting Unique Users in Real-Time: Here's a Challenge for You!Counting Unique Users in Real-Time: Here's a Challenge for You!
Counting Unique Users in Real-Time: Here's a Challenge for You!
DataWorks Summit
 
Introduction to Apache Tajo: Future of Data Warehouse
Introduction to Apache Tajo: Future of Data WarehouseIntroduction to Apache Tajo: Future of Data Warehouse
Introduction to Apache Tajo: Future of Data Warehouse
Gruter
 
Lessons learned from designing a QA Automation for analytics databases (big d...
Lessons learned from designing a QA Automation for analytics databases (big d...Lessons learned from designing a QA Automation for analytics databases (big d...
Lessons learned from designing a QA Automation for analytics databases (big d...
Omid Vahdaty
 
Search@flipkart
Search@flipkartSearch@flipkart
Search@flipkart
Umesh Prasad
 
Journey through high performance django application
Journey through high performance django applicationJourney through high performance django application
Journey through high performance django application
bangaloredjangousergroup
 
Enabling presto to handle massive scale at lightning speed
Enabling presto to handle massive scale at lightning speedEnabling presto to handle massive scale at lightning speed
Enabling presto to handle massive scale at lightning speed
Shubham Tagra
 
Production ready big ml workflows from zero to hero daniel marcous @ waze
Production ready big ml workflows from zero to hero daniel marcous @ wazeProduction ready big ml workflows from zero to hero daniel marcous @ waze
Production ready big ml workflows from zero to hero daniel marcous @ waze
Ido Shilon
 
OutSystems Tips and Tricks
OutSystems Tips and TricksOutSystems Tips and Tricks
OutSystems Tips and Tricks
OutSystems
 
Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...
Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...
Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...
Lucidworks
 
Using Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsUsing Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning Models
Scott Clark
 
Using Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsUsing Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning Models
SigOpt
 
Splunk, SIEMs, and Big Data - The Undercroft - November 2019
Splunk, SIEMs, and Big Data - The Undercroft - November 2019Splunk, SIEMs, and Big Data - The Undercroft - November 2019
Splunk, SIEMs, and Big Data - The Undercroft - November 2019
Jonathan Singer
 
Tuning for Systematic Trading: Talk 2: Deep Learning
Tuning for Systematic Trading: Talk 2: Deep LearningTuning for Systematic Trading: Talk 2: Deep Learning
Tuning for Systematic Trading: Talk 2: Deep Learning
SigOpt
 
Volodymyr Lyubinets. One startup's journey of building ML pipelines for text ...
Volodymyr Lyubinets. One startup's journey of building ML pipelines for text ...Volodymyr Lyubinets. One startup's journey of building ML pipelines for text ...
Volodymyr Lyubinets. One startup's journey of building ML pipelines for text ...
Lviv Startup Club
 
Statistical Arbitrage
Statistical ArbitrageStatistical Arbitrage
Statistical Arbitrage
Shubham Patil
 
Faceted Search And Result Reordering
Faceted Search And Result ReorderingFaceted Search And Result Reordering
Faceted Search And Result Reordering
Varun Thacker
 

Similar to Query optimization in Apache Tajo (20)

Data Enginering from Google Data Warehouse
Data Enginering from Google Data WarehouseData Enginering from Google Data Warehouse
Data Enginering from Google Data Warehouse
 
Denodo Data Virtualization Platform Architecture: Performance (session 2 from...
Denodo Data Virtualization Platform Architecture: Performance (session 2 from...Denodo Data Virtualization Platform Architecture: Performance (session 2 from...
Denodo Data Virtualization Platform Architecture: Performance (session 2 from...
 
7.4 Admin Tools and Best Practices
7.4 Admin Tools and Best Practices7.4 Admin Tools and Best Practices
7.4 Admin Tools and Best Practices
 
Final Presentation.pptx
Final Presentation.pptxFinal Presentation.pptx
Final Presentation.pptx
 
Counting Unique Users in Real-Time: Here's a Challenge for You!
Counting Unique Users in Real-Time: Here's a Challenge for You!Counting Unique Users in Real-Time: Here's a Challenge for You!
Counting Unique Users in Real-Time: Here's a Challenge for You!
 
Introduction to Apache Tajo: Future of Data Warehouse
Introduction to Apache Tajo: Future of Data WarehouseIntroduction to Apache Tajo: Future of Data Warehouse
Introduction to Apache Tajo: Future of Data Warehouse
 
Lessons learned from designing a QA Automation for analytics databases (big d...
Lessons learned from designing a QA Automation for analytics databases (big d...Lessons learned from designing a QA Automation for analytics databases (big d...
Lessons learned from designing a QA Automation for analytics databases (big d...
 
Search@flipkart
Search@flipkartSearch@flipkart
Search@flipkart
 
Journey through high performance django application
Journey through high performance django applicationJourney through high performance django application
Journey through high performance django application
 
Enabling presto to handle massive scale at lightning speed
Enabling presto to handle massive scale at lightning speedEnabling presto to handle massive scale at lightning speed
Enabling presto to handle massive scale at lightning speed
 
Production ready big ml workflows from zero to hero daniel marcous @ waze
Production ready big ml workflows from zero to hero daniel marcous @ wazeProduction ready big ml workflows from zero to hero daniel marcous @ waze
Production ready big ml workflows from zero to hero daniel marcous @ waze
 
OutSystems Tips and Tricks
OutSystems Tips and TricksOutSystems Tips and Tricks
OutSystems Tips and Tricks
 
Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...
Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...
Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...
 
Using Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsUsing Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning Models
 
Using Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsUsing Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning Models
 
Splunk, SIEMs, and Big Data - The Undercroft - November 2019
Splunk, SIEMs, and Big Data - The Undercroft - November 2019Splunk, SIEMs, and Big Data - The Undercroft - November 2019
Splunk, SIEMs, and Big Data - The Undercroft - November 2019
 
Tuning for Systematic Trading: Talk 2: Deep Learning
Tuning for Systematic Trading: Talk 2: Deep LearningTuning for Systematic Trading: Talk 2: Deep Learning
Tuning for Systematic Trading: Talk 2: Deep Learning
 
Volodymyr Lyubinets. One startup's journey of building ML pipelines for text ...
Volodymyr Lyubinets. One startup's journey of building ML pipelines for text ...Volodymyr Lyubinets. One startup's journey of building ML pipelines for text ...
Volodymyr Lyubinets. One startup's journey of building ML pipelines for text ...
 
Statistical Arbitrage
Statistical ArbitrageStatistical Arbitrage
Statistical Arbitrage
 
Faceted Search And Result Reordering
Faceted Search And Result ReorderingFaceted Search And Result Reordering
Faceted Search And Result Reordering
 

Recently uploaded

Low power architecture of logic gates using adiabatic techniques
Low power architecture of logic gates using adiabatic techniquesLow power architecture of logic gates using adiabatic techniques
Low power architecture of logic gates using adiabatic techniques
nooriasukmaningtyas
 
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODELDEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
gerogepatton
 
Recycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part IIIRecycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part III
Aditya Rajan Patra
 
6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)
ClaraZara1
 
Wearable antenna for antenna applications
Wearable antenna for antenna applicationsWearable antenna for antenna applications
Wearable antenna for antenna applications
Madhumitha Jayaram
 
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECTCHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT
jpsjournal1
 
bank management system in java and mysql report1.pdf
bank management system in java and mysql report1.pdfbank management system in java and mysql report1.pdf
bank management system in java and mysql report1.pdf
Divyam548318
 
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMSA SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
IJNSA Journal
 
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
ihlasbinance2003
 
New techniques for characterising damage in rock slopes.pdf
New techniques for characterising damage in rock slopes.pdfNew techniques for characterising damage in rock slopes.pdf
New techniques for characterising damage in rock slopes.pdf
wisnuprabawa3
 
Manufacturing Process of molasses based distillery ppt.pptx
Manufacturing Process of molasses based distillery ppt.pptxManufacturing Process of molasses based distillery ppt.pptx
Manufacturing Process of molasses based distillery ppt.pptx
Madan Karki
 
2. Operations Strategy in a Global Environment.ppt
2. Operations Strategy in a Global Environment.ppt2. Operations Strategy in a Global Environment.ppt
2. Operations Strategy in a Global Environment.ppt
PuktoonEngr
 
Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...
IJECEIAES
 
Understanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine LearningUnderstanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine Learning
SUTEJAS
 
ACRP 4-09 Risk Assessment Method to Support Modification of Airfield Separat...
ACRP 4-09 Risk Assessment Method to Support Modification of Airfield Separat...ACRP 4-09 Risk Assessment Method to Support Modification of Airfield Separat...
ACRP 4-09 Risk Assessment Method to Support Modification of Airfield Separat...
Mukeshwaran Balu
 
basic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdfbasic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdf
NidhalKahouli2
 
Exception Handling notes in java exception
Exception Handling notes in java exceptionException Handling notes in java exception
Exception Handling notes in java exception
Ratnakar Mikkili
 
14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application
SyedAbiiAzazi1
 
Iron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdf
Iron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdfIron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdf
Iron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdf
RadiNasr
 
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student MemberIEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
VICTOR MAESTRE RAMIREZ
 

Recently uploaded (20)

Low power architecture of logic gates using adiabatic techniques
Low power architecture of logic gates using adiabatic techniquesLow power architecture of logic gates using adiabatic techniques
Low power architecture of logic gates using adiabatic techniques
 
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODELDEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
 
Recycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part IIIRecycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part III
 
6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)
 
Wearable antenna for antenna applications
Wearable antenna for antenna applicationsWearable antenna for antenna applications
Wearable antenna for antenna applications
 
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECTCHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT
 
bank management system in java and mysql report1.pdf
bank management system in java and mysql report1.pdfbank management system in java and mysql report1.pdf
bank management system in java and mysql report1.pdf
 
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMSA SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
 
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
 
New techniques for characterising damage in rock slopes.pdf
New techniques for characterising damage in rock slopes.pdfNew techniques for characterising damage in rock slopes.pdf
New techniques for characterising damage in rock slopes.pdf
 
Manufacturing Process of molasses based distillery ppt.pptx
Manufacturing Process of molasses based distillery ppt.pptxManufacturing Process of molasses based distillery ppt.pptx
Manufacturing Process of molasses based distillery ppt.pptx
 
2. Operations Strategy in a Global Environment.ppt
2. Operations Strategy in a Global Environment.ppt2. Operations Strategy in a Global Environment.ppt
2. Operations Strategy in a Global Environment.ppt
 
Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...
 
Understanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine LearningUnderstanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine Learning
 
ACRP 4-09 Risk Assessment Method to Support Modification of Airfield Separat...
ACRP 4-09 Risk Assessment Method to Support Modification of Airfield Separat...ACRP 4-09 Risk Assessment Method to Support Modification of Airfield Separat...
ACRP 4-09 Risk Assessment Method to Support Modification of Airfield Separat...
 
basic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdfbasic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdf
 
Exception Handling notes in java exception
Exception Handling notes in java exceptionException Handling notes in java exception
Exception Handling notes in java exception
 
14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application
 
Iron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdf
Iron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdfIron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdf
Iron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdf
 
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student MemberIEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
 

Query optimization in Apache Tajo

  • 1. Query Optimization in Apache Tajo Jihoon Son / Gruter inc.
  • 2. About Me ● Jihoon Son (@jihoonson) ○ Tajo project co-founder ○ Committer and PMC member of Apache Tajo ○ Research engineer at Gruter 2
  • 3. ● Introduction to Tajo ● Query processing in Tajo ○ Query plans in Tajo ○ Query processing example ● Query optimization in Tajo ○ Introduction to query optimization ○ Query optimization techniques in Tajo Outline 3
  • 4. ● Apache Top-level Project ○ Data warehouse system ■ Efficient processing of analytic queries ■ ANSI-SQL compliant ○ Scalable and rapid query execution with own engine ■ Distributed query processing ■ Fault-tolerance ○ Beyond SQL-on-Hadoop ■ Support various types of storage ● HDFS, S3, hbase, rdbms, ... What is Tajo? 4
  • 5. Highlighted Features ● Support long-running batch queries as well as interactive ad-hoc queries ○ Fast query processing ■ Optimized scan performance ● 120 MB/sec per physical disk (SATA) ○ Reliability ■ Fault tolerance ■ No single point of failure with HA support 5
  • 6. Highlighted Features ● Support of various kinds of data sources ○ HDFS, Amazon S3, Google Cloud Storage, HBase, RDBMS, ... ● Mature SQL support ○ Various kinds of join support ○ Window function support ○ Cost-based query optimization ● Integration with other systems ○ Notebooks like Zeppelin ○ BI tools 6
  • 7. Recent Release: 0.11 ● Feature highlights ○ Query federation ○ JDBC-based storage support ○ Self-describing data formats support ○ Multi-query support ○ More stable and efficient join execution ○ Index support ○ Python UDF/UDAF support 7
  • 8. Tajo Master Catalog Server Tajo Master Catalog Server Architecture Overview DBMS HCatalog Tajo Master Catalog Server Tajo Worker Query Master Query Executor Storage Service Tajo Worker Query Master Query Executor Storage Service Tajo Worker Query Master Query Executor Storage Service JDBC client TSQLWebUI REST API Storage Submit a query Manage metadataAllocate a query Send tasks & monitor Send tasks & monitor 8
  • 9. Tajo Worker Query Master Tajo Worker Query Master Tajo Worker Query Master Query Execution Steps 9 Tajo Master Catalog Server Tajo Client ① Submit a query DBMS ② Assign a query ● Initializing a query execution ③ Build a query execution plan
  • 10. Tajo Worker Query Executor Storage Service Tajo Worker Query Master Query Executor Storage Service Tajo Worker Query Executor Storage Service Query Execution Steps 10 Storage ⑥ Send status and progress ⑤ Read and process data ④ Send tasks & monitor ● Executing a query Tajo Master
  • 11. Tajo Worker Query Executor Storage Service Tajo Worker Query Master Query Executor Storage Service Tajo Worker Query Executor Storage Service Query Execution Steps 11 Tajo Client Storage ⑧ Notify that query execution is completed ⑦ Store the result on storage ⑨ Send the result location ⑩ Read the result ● Finalizing the query execution Tajo Master
  • 13. ● Given a user query, a query execution plan is an ordered set of steps to execute the query ○ Example ■ Read data from storage, and then do join on some join keys, and finally aggregate with some aggregation keys ● In Tajo, there are three kinds of query plans ○ Query master generates a logical query plan and a distributed query plan ○ Query executor of tajo workers generates a local query plan Query Execution Plan 13
  • 14. Query Planning Steps in Tajo 14 SQL SQL Analyzer Algebraic Expression Logical Planner Logical Query Plan Global Planner Distributed Query Plan Physical Planner Local Query Plan Query Executor Query Master Distributed to tajo workers
  • 15. Join Logical Query Plan ● A tree of relational algebras ● Example 15 SELECT item.brand, sum(price) FROM sales, item WHERE sales.item_key = item.item_key GROUP BY item.brand, Scan on item Scan on sales Group by < SQL > < Logical query plan > key: item_key key: brand func: sum(price)
  • 16. Distributed Query Plan ● A plan with additional annotations for distributed execution ○ Data exchange (shuffle) keys, methods, ... 16 < Distributed query plan > Join Scan on item Scan on sales Group by < Logical query plan > key: item_key key: brand func: sum(price) Join Scan on item Scan on sales Group by key: item_key key: brand func: sum(price) Hash shuffle with item_key Hash shuffle with item_key Range shuffle with brand
  • 17. Local Query Plan ● A plan with additional annotations for local execution ○ In-memory algorithm, disk-based algorithm, … 17 < Distributed query plan > Join Scan on item Scan on sales Group by key: item_key key: brand func: sum(price) Hash shuffle with item_key Hash shuffle with item_key Range shuffle with brand < Local query plan > Join Scan on item Scan on sales Group by key: item_key key: brand func: sum(price) Hash shuffle with item_key Hash shuffle with item_key Range shuffle with brandSort-merge join Hash aggregation
  • 18. Query Processing in Tajo ● A query is executed by executing multiple stages subsequently ○ A stage is a minimum unit to execute at least a single operator ● Each stage is processed by multiple query executors of tajo worker in parallel 18 Join Scan on item Scan on sales key: item_key Stage 2 Stage 1
  • 19. ● SQL ● Logical query plan Query Processing Example 19 Join SELECT item.brand, sum(price) FROM sales, item WHERE sales.item_key = item.item_key GROUP BY item.brand, Scan on item Scan on sales Group by key: item_key key: brand func: sum(price)
  • 20. ● Logical query plan ● Distributed query plan Query Processing Example 20 Join Scan on item Scan on sales Group by key: item_key key: brand func: sum(price) Join Scan on item Scan on sales Group by key: item_key key: brand func: sum(price) Stage 3 Stage 2 Stage 1 Hash shuffle with item_key Range shuffle with brand Hash shuffle with item_key
  • 21. Query Processing Example ● Distributed query plan 21 Join Scan on item Scan on sales Group by key: item_key key: brand func: sum(price) Stage 3 Stage 2 Stage 1 Hash shuffle with item_key Range shuffle with brand Hash shuffle with item_key item item sales sales sales Worker Scan Worker Scan Worker Scan Worker Scan Worker Scan ● Distributed processing
  • 22. Query Processing Example 22 Join Scan on item Scan on sales Group by key: item_key key: brand func: sum(price) Stage 3 Stage 2 Stage 1 Hash shuffle with item_key Range shuffle with brand Hash shuffle with item_key item item sales sales sales Worker Scan Worker Scan Worker Scan Worker Scan Worker Scan Worker Join Worker Join Worker Join Worker Join Worker Join shuffle ● Distributed query plan ● Distributed processing
  • 23. Query Processing Example ● Distributed query plan 23 Join Scan on item Scan on sales Group by key: item_key key: brand func: sum(price) Stage 3 Stage 2 Stage 1 Hash shuffle with item_key Range shuffle with brand Hash shuffle with item_key item item sales sales sales Worker Scan Worker Scan Worker Scan Worker Scan Worker Scan Worker Join Worker Join Worker Join Worker Join Worker Join Worker Group by Worker Group by Worker Group by Worker Group by Worker Group by shuffle shuffle ● Distributed processing
  • 25. Query Optimization ● Mostly, user queries are not optimized for performance ● The query optimizer attempts to determine the most efficient way to execute a user query ○ Considering the possible query plans, and choosing the best one 25
  • 26. Extreme Example ● Query ○ select * from t where name like 'tajo%' order by id; ● Possible plans 26 Scan Sort Filter Scan with Filter Sort● Naive plan ○ Filtering out tuples after sort ○ Large cost for sort ● Better plan ○ Filtering out tuples after scan immediately ○ Small cost for sort ○ Reduced number of operations
  • 27. Two Kinds of Query Optimization ● Rule-based optimization ○ A set of predefined rules is used to choose a good plan ○ Usually, heuristic approaches are used ■ Ex) filters should be pushed down to the lower part of the query plan as much as possible ● Cost-based optimization ○ Enumerating possible query plans and choosing the one having the lowest cost ○ Cost function has an important role ● Tajo utilizes both types of optimization 27
  • 28. Query Optimization in Tajo ● Difference from traditional query optimization ○ Unlike traditional database systems, pre-collected statistics is not so important ■ Data may be added or updated by several systems including Flume, Kafka, Tajo, … ■ Pre-collected statistics can be useful, but is not fully trustworthy ○ It is important to optimize query plans with minimal statistics ■ Volume of input relations 28
  • 29. Query Optimization in Tajo ● Tajo has two different approaches for query optimization ○ Static optimization ■ Traditional approach ■ Optimizing the plan during the query planning phase ○ Progressive optimization ■ Optimizing the plan based on the intermediate statistics while executing the query ● A query plan can be optimized without pre-collected statistics ● Especially effective for queries which require multiple stage execution 29
  • 30. Logical Query Plan Optimization ● Rule-based optimization ○ Access path rewrite rule ■ Choosing access path to data ■ Index scan has the highest priority if available ○ Distributivity rule ■ Reducing filters based on distributivity ○ Filter pushdown rule ■ Pushing down filters to the lowest part as much as possible ○ In-subquery rewrite rule ■ Transforming subqueries in 'IN' filters to semi(anti) joins 30
  • 31. Logical Query Plan Optimization ● Rule-based optimization (cont') ○ Projection pushdown rule ■ Pushing down projections to the lowest part as much as possible ● Cost-based optimization ○ Join order optimization ■ Finding a join order of lowest cost ■ Greedy heuristic: ordering relations from small ones to large ones ● Very effective in single computing environment ● Need to improve for parallel computing environment 31
  • 32. Distributed Query Plan Optimization ● Rule-based optimization ○ Two-phase execution of operators ■ Operators which require data shuffling like aggregation, join, or sort are executed in two-phase ■ First phase is for local computing to reduce the amount of shuffled data ■ Second phase is to get the result of the operation 32
  • 33. Two-phase Execution Example ● Logical query plan 33 ● Distributed query plan Group by Scan Sort Group by Scan SortStage 3 Stage 2 Stage 1 Group by Sort Local group by Local sort
  • 34. Distributed Query Plan Optimization ● Distributed join algorithm selection ○ Two representative distributed join algorithms ■ Join cannot be performed within a single stage in distributed systems ● Tuples of the same join key may be distributed over cluster nodes ■ Repartition join ● Both input relations are shuffled with the join key columns ■ Broadcast join ● Small relations are broadcasted to every node before join 34
  • 35. Example of Repartition Join ● select … from employee e, department d where e.DeptName = d. DeptName 35
  • 36. Example of Broadcast Join ● select … from employee e, department d where e.DeptName = d. DeptName 36
  • 37. Distributed Join Algorithm Selection ● Repartition join VS broadcast join ○ Given a set of joins, some parts can be executed with broadcast join while remaining parts are executed with repartition join ● Which parts will be executed with broadcast join? ○ Greedy heuristic: broadcast join is used as many as possible ■ The size of input relation should be smaller than pre- defined threshold ■ The total volume of broadcasted relations should not exceed pre-defined threshold 37
  • 38. Distributed Join Algorithm Selection Example ● select … from lineitem, nation, region … 38
  • 39. Local Query Plan Optimization ● Selecting the best algorithm based on the current resource status ○ Aggregation ■ Hash aggregation, sort aggregation ○ Join ■ Hash join, sort-merge join ● For sort, hash sort is basically used with spilling data to disk when it doesn't fit into memory 39
  • 40. Progressive Optimization ● Data repartition ○ Some operators like join or aggregation require to shuffle data with keys ○ The number of result partitions of shuffle should be carefully decided ■ The number of partitions is related to the number of tasks of the next stage ● At the beginning of each stage, the number of partitions is decided based on the input size 40
  • 41. Progressive Optimization Example 41 Group by Scan on item (100GB) SortStage 3 Stage 2 Stage 1 Group by Sort # of partitions: 100 ● If the default task size is 1GB, Group by Scan on item SortStage 3 Stage 2 Stage 1 Group by (50GB) Sort # of partitions: 50 # of tasks: 100 # of tasks: 50
  • 42. Future Work ● Adding more optimization methods ● Improve cost functions for more effective cost-based optimization ● Adding new approaches for progressive optimization ○ Runtime query rewriting ○ Integrating with genetic algorithm ○ … 42
  • 43. 43 Get Involved! ● General ○ http://tajo.apache.org ● Getting Started ○ http://tajo.apache.org/docs/current/getting_started.html ● Downloads ○ http://tajo.apache.org/downloads.html ● Jira – Issue Tracker ○ https://issues.apache.org/jira/browse/TAJO ● Join the mailing list ○ dev-subscribe@tajo.apache.org ○ issues-subscribe@tajo.apache.org