Query optimization in Apache Tajo

Query Optimization in
Apache Tajo
Jihoon Son / Gruter inc.

About Me
● Jihoon Son (@jihoonson)
○ Tajo project co-founder
○ Committer and PMC member of Apache Tajo
○ Research engineer at Gruter
2

● Introduction to Tajo
● Query processing in Tajo
○ Query plans in Tajo
○ Query processing example
● Query optimization in Tajo
○ Introduction to query optimization
○ Query optimization techniques in Tajo
Outline
3

● Apache Top-level Project
○ Data warehouse system
■ Efficient processing of analytic queries
■ ANSI-SQL compliant
○ Scalable and rapid query execution with own engine
■ Distributed query processing
■ Fault-tolerance
○ Beyond SQL-on-Hadoop
■ Support various types of storage
● HDFS, S3, hbase, rdbms, ...
What is Tajo?
4

Highlighted Features
● Support long-running batch queries as well as
interactive ad-hoc queries
○ Fast query processing
■ Optimized scan performance
● 120 MB/sec per physical disk (SATA)
○ Reliability
■ Fault tolerance
■ No single point of failure with HA support
5

Highlighted Features
● Support of various kinds of data sources
○ HDFS, Amazon S3, Google Cloud Storage, HBase,
RDBMS, ...
● Mature SQL support
○ Various kinds of join support
○ Window function support
○ Cost-based query optimization
● Integration with other systems
○ Notebooks like Zeppelin
○ BI tools
6

Recent Release: 0.11
● Feature highlights
○ Query federation
○ JDBC-based storage support
○ Self-describing data formats support
○ Multi-query support
○ More stable and efficient join execution
○ Index support
○ Python UDF/UDAF support
7

Tajo Master
Catalog Server
Tajo Master
Catalog Server
Architecture Overview
DBMS
HCatalog
Tajo Master
Catalog Server
Tajo Worker
Query Master
Query Executor
Storage Service
Tajo Worker
Query Master
Query Executor
Storage Service
Tajo Worker
Query Master
Query Executor
Storage Service
JDBC client
TSQLWebUI
REST API
Storage
Submit
a query
Manage
metadataAllocate
a query
Send tasks
& monitor
Send tasks
& monitor
8

Tajo Worker
Query Master
Tajo Worker
Query Master
Tajo Worker
Query Master
Query Execution Steps
9
Tajo Master
Catalog Server
Tajo Client
① Submit a
query
DBMS
② Assign a
query
● Initializing a query execution
③ Build a query
execution plan

Tajo Worker
Query Executor
Storage Service
Tajo Worker
Query Master
Query Executor
Storage Service
Tajo Worker
Query Executor
Storage Service
10
Storage
⑥ Send status
and progress
⑤ Read and
process data
④ Send tasks
& monitor
● Executing a query
Tajo Master

Tajo Worker
Query Executor
Storage Service
Tajo Worker
Query Master
Query Executor
Storage Service
Tajo Worker
Query Executor
Storage Service
11
Tajo Client
Storage
⑧ Notify that query
execution is completed
⑦ Store the result
on storage
⑨ Send the
result location
⑩ Read the
result
● Finalizing the query execution
Tajo Master

● Given a user query, a query execution plan is an
ordered set of steps to execute the query
○ Example
■ Read data from storage, and then do join on some join
keys, and finally aggregate with some aggregation keys
● In Tajo, there are three kinds of query plans
○ Query master generates a logical query plan and a
distributed query plan
○ Query executor of tajo workers generates a local query
plan
Query Execution Plan
13

Query Planning Steps in Tajo
14
SQL
SQL
Analyzer
Algebraic
Expression
Logical
Planner
Logical Query
Plan
Global
Planner
Distributed
Query Plan
Physical
Planner
Local Query
Plan
Query Executor
Query Master
Distributed to
tajo workers

Join
Logical Query Plan
● A tree of relational algebras
● Example
15
SELECT
item.brand,
sum(price)
FROM
sales,
item
WHERE
sales.item_key =
item.item_key
GROUP BY
item.brand,
Scan on
item
Scan on
sales
Group by
< SQL > < Logical query plan >
key: item_key
key: brand
func: sum(price)

Distributed Query Plan
● A plan with additional annotations for distributed
execution
○ Data exchange (shuffle) keys, methods, ...
16
< Distributed query plan >
Join
Scan on
item
Scan on
sales
Group by
< Logical query plan >
key: item_key
key: brand
func: sum(price)
Join
Scan on
item
Scan on
sales
Group by
key: item_key
key: brand
func: sum(price)
Hash shuffle with
item_key
Hash shuffle with
item_key
Range shuffle
with brand

Local Query Plan
● A plan with additional annotations for local execution
○ In-memory algorithm, disk-based algorithm, …
17
< Distributed query plan >
Join
Scan on
item
Scan on
sales
Group by
key: item_key
key: brand
func: sum(price)
Hash shuffle with
item_key
Hash shuffle with
item_key
Range shuffle
with brand
< Local query plan >
Join
Scan on
item
Scan on
sales
Group by
key: item_key
key: brand
func: sum(price)
Hash shuffle with
item_key
Hash shuffle with
item_key
Range shuffle
with brandSort-merge
join
Hash
aggregation

Query Processing in Tajo
● A query is executed by executing multiple stages
subsequently
○ A stage is a minimum unit to execute at least a single
operator
● Each stage is processed by multiple query executors of
tajo worker in parallel
18
Join
Scan on
item
Scan on
sales
key: item_key
Stage 2
Stage 1

● SQL ● Logical query plan
Query Processing Example
19
Join
SELECT
item.brand,
sum(price)
FROM
sales,
item
WHERE
sales.item_key =
item.item_key
GROUP BY
item.brand,
Scan on
item
Scan on
sales
Group by
key: item_key
key: brand
func: sum(price)

● Logical query plan ● Distributed query plan
20
Join
Scan on
item
Scan on
sales
Group by
key: item_key
key: brand
func: sum(price)
Join
Scan on
item
Scan on
sales
Group by
key: item_key
key: brand
func: sum(price)
Stage 3
Stage 2
Stage 1
Hash shuffle
with item_key
Range shuffle
with brand
Hash shuffle
with item_key

● Distributed query plan
21
Join
Scan on
item
Scan on
sales
Group by
key: item_key
key: brand
func: sum(price)
Stage 3
Stage 2
Stage 1
Hash shuffle
with item_key
Range shuffle
with brand
Hash shuffle
with item_key
item item sales sales sales
Worker
Scan
Worker
Scan
Worker
Scan
Worker
Scan
Worker
Scan
● Distributed processing

22
Join
Scan on
item
Scan on
sales
Group by
key: item_key
key: brand
func: sum(price)
Stage 3
Stage 2
Stage 1
Hash shuffle
with item_key
Range shuffle
with brand
Hash shuffle
with item_key
Worker
Scan
Worker
Scan
Worker
Scan
Worker
Scan
Worker
Scan
Worker
Join
Worker
Join
Worker
Join
Worker
Join
Worker
Join
shuffle
● Distributed query plan ● Distributed processing

23
Join
Scan on
item
Scan on
sales
Group by
key: item_key
key: brand
func: sum(price)
Stage 3
Stage 2
Stage 1
Hash shuffle
with item_key
Range shuffle
with brand
Hash shuffle
with item_key
Worker
Scan
Worker
Scan
Worker
Scan
Worker
Scan
Worker
Scan
Worker
Join
Worker
Join
Worker
Join
Worker
Join
Worker
Join
Worker
Group by
Worker
Group by
Worker
Group by
Worker
Group by
Worker
Group by
shuffle
shuffle
● Distributed processing

Query Optimization
● Mostly, user queries are not optimized for
performance
● The query optimizer attempts to determine the most
efficient way to execute a user query
○ Considering the possible query plans, and choosing the
best one
25

Extreme Example
● Query
○ select * from t where name like 'tajo%' order by id;
● Possible plans
26
Scan
Sort
Filter
Scan with
Filter
Sort● Naive plan
○ Filtering out tuples
after sort
○ Large cost for sort
● Better plan
○ Filtering out tuples
after scan immediately
○ Small cost for sort
○ Reduced number of
operations

Two Kinds of Query Optimization
● Rule-based optimization
○ A set of predefined rules is used to choose a good plan
○ Usually, heuristic approaches are used
■ Ex) filters should be pushed down to the lower part of the
query plan as much as possible
● Cost-based optimization
○ Enumerating possible query plans and choosing the one
having the lowest cost
○ Cost function has an important role
● Tajo utilizes both types of optimization
27

Query Optimization in Tajo
● Difference from traditional query optimization
○ Unlike traditional database systems, pre-collected
statistics is not so important
■ Data may be added or updated by several systems
including Flume, Kafka, Tajo, …
■ Pre-collected statistics can be useful, but is not fully
trustworthy
○ It is important to optimize query plans with minimal
statistics
■ Volume of input relations
28

Query Optimization in Tajo
● Tajo has two different approaches for query
optimization
○ Static optimization
■ Traditional approach
■ Optimizing the plan during the query planning phase
○ Progressive optimization
■ Optimizing the plan based on the intermediate statistics
while executing the query
● A query plan can be optimized without pre-collected
statistics
● Especially effective for queries which require multiple stage
execution 29

Logical Query Plan Optimization
○ Access path rewrite rule
■ Choosing access path to data
■ Index scan has the highest priority if available
○ Distributivity rule
■ Reducing filters based on distributivity
○ Filter pushdown rule
■ Pushing down filters to the lowest part as much as
possible
○ In-subquery rewrite rule
■ Transforming subqueries in 'IN' filters to semi(anti) joins
30

Logical Query Plan Optimization
● Rule-based optimization (cont')
○ Projection pushdown rule
■ Pushing down projections to the lowest part as much as
possible
● Cost-based optimization
○ Join order optimization
■ Finding a join order of lowest cost
■ Greedy heuristic: ordering relations from small ones to
large ones
● Very effective in single computing environment
● Need to improve for parallel computing environment
31

Distributed Query Plan Optimization
○ Two-phase execution of operators
■ Operators which require data shuffling like aggregation,
join, or sort are executed in two-phase
■ First phase is for local computing to reduce the amount of
shuffled data
■ Second phase is to get the result of the operation
32

Two-phase Execution Example
● Logical query plan
33
Group by
Scan
Sort
Group by
Scan
SortStage 3
Stage 2
Stage 1
Group by
Sort
Local
group by
Local
sort

Distributed Query Plan Optimization
● Distributed join algorithm selection
○ Two representative distributed join algorithms
■ Join cannot be performed within a single stage in
distributed systems
● Tuples of the same join key may be distributed over cluster
nodes
■ Repartition join
● Both input relations are shuffled with the join key columns
■ Broadcast join
● Small relations are broadcasted to every node before join
34

Example of Repartition Join
● select … from employee e, department d where e.DeptName = d.
DeptName
35

Example of Broadcast Join
● select … from employee e, department d where e.DeptName = d.
DeptName
36

Distributed Join Algorithm Selection
● Repartition join VS broadcast join
○ Given a set of joins, some parts can be executed with
broadcast join while remaining parts are executed with
repartition join
● Which parts will be executed with broadcast join?
○ Greedy heuristic: broadcast join is used as many as
possible
■ The size of input relation should be smaller than pre-
defined threshold
■ The total volume of broadcasted relations should not
exceed pre-defined threshold 37

Distributed Join Algorithm Selection Example
● select … from lineitem, nation, region …
38

Local Query Plan Optimization
● Selecting the best algorithm based on the current
resource status
○ Aggregation
■ Hash aggregation, sort aggregation
○ Join
■ Hash join, sort-merge join
● For sort, hash sort is basically used with spilling data to
disk when it doesn't fit into memory
39

Progressive Optimization
● Data repartition
○ Some operators like join or aggregation require to
shuffle data with keys
○ The number of result partitions of shuffle should be
carefully decided
■ The number of partitions is related to the number of tasks
of the next stage
● At the beginning of each stage, the number of
partitions is decided based on the input size
40

Progressive Optimization Example
41
Group by
Scan on item
(100GB)
SortStage 3
Stage 2
Stage 1
Group by
Sort
# of partitions: 100
● If the default task size is 1GB,
Group by
Scan on item
SortStage 3
Stage 2
Stage 1
Group by
(50GB)
Sort
# of partitions: 50
# of tasks: 100
# of tasks: 50

Future Work
● Adding more optimization methods
● Improve cost functions for more effective cost-based
optimization
● Adding new approaches for progressive optimization
○ Runtime query rewriting
○ Integrating with genetic algorithm
○ …
42

43
Get Involved!
● General
○ http://tajo.apache.org
● Getting Started
○ http://tajo.apache.org/docs/current/getting_started.html
● Downloads
○ http://tajo.apache.org/downloads.html
● Jira – Issue Tracker
○ https://issues.apache.org/jira/browse/TAJO
● Join the mailing list
○ dev-subscribe@tajo.apache.org
○ issues-subscribe@tajo.apache.org

Query optimization in Apache Tajo

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Viewers also liked

Viewers also liked (7)

Similar to Query optimization in Apache Tajo

Similar to Query optimization in Apache Tajo (20)

Recently uploaded

Recently uploaded (20)

Query optimization in Apache Tajo