SlideShare a Scribd company logo
Introduction to Tajo
(Tajo: A Distributed Data Warehouse System for Hadoop)


                Hyunsik Choi
         (hyunsik at apache dot org)
Introduction
• Tajo is a distributed warehouse system for Hadoop.
• Tajo is designed for low-latency and scalable ETL and ad-hoc queries
  on large-data sets by leveraging advanced database techniques.
• Support SQL standards
• Tajo has its own query engine and uses HDFS as a primary storage
  layer.
   – Direct control of distributed execution and data flow
   – A variety of query evaluation strategies
   – More optimization opportunities
• Row/Native columnar execution engine and optimizer
• An alternative choice to Hive/Pig on the top of MapReduce
Design Principles
•   Scalability
•   Low Latency
•   Fault Tolerance
•   Application of Advanced Database Techniques
Overall Architecture
• The architecture of Tajo follows the master-worker
  model.
   – TajoMaster
      • A dedicated server for providing client service and coordinating
        QueryMasters
   – For each query , one QueryMaster and many task runners work
     together.
• TaskRunner includes a local query engine that executes a
  directed acyclic graph (DAG) of physical operators.
• Tajo employs Hadoop Yarn as a resource manager for
  large clusters.
Overall Architecture
                                        TaskRunner

             Tajo Master
Client   Parser    Planner     Query
                               Master

Client   Catalog   Optimizer

                    Query                 Node
Client             Manager
                                          Managers

                               Query
                               Master



              Resource
              Manager
               of Yarn
Data Structure & Terminology
• Logical Plan
  – A representation of relational algebra
  – Implemented in a graph of java objects
  – (de) serialized in a JSON document via GSON
• Physical Execution Plan
  – A directed acyclic graph of physical operators
• Execution Block
  – A subquery executed across a number of cluster nodes.
• Distributed Query Execution Plan (DQEP)
  – A directed acyclic graph of execution blocks.
Tajo Master
• It provides a client service that enables users to
  submit to queries and to ask catalog information.
• It coordinates a number of QueryMasters.
• It provides a catalog service.
  – Catalog Server is embedded in Tajo Master.
  – Catalog Server
     • Maintains the table and index descriptions
     • Supports some statistics (min, max value, num of rows, and
       bytes)
     • Uses Apache Derby as the persistent storage via JDBC.
Query Master
• Query Master is responsible for
   – controlling a distributed execution plan (i.e., a DAG of
     execution blocks)
   – launching TaskRunner containers
   – coordinating TaskRunners
   – localizing tasks
   – scheduling tasks to TaskRunners
• Also, It monitors running tasks, collects the statistics,
  reports them to TajoMaster.
TaskRunner
• It contains a local query engine that actually executes tasks.
• A TaskRunner is launched by NodeManager of Hadoop Yarn.
• It is reused within a execution block.
   – According to the workload of the execution block, different resources
     are assigned to containers.
TaskRunner
• In TaskRunner, a physical execution plan is
  completed according to the container’s
  resource or the size of input data.
  – It allows a TaskRunner to choose more efficient
    execution plans and to maximize the utilization of
    hardware capacity.
  – This feature is useful for heterogeneous
    environments.
Local Query Engine
• Now, Tajo provides a row-based execution engine.
   – It executes a physical execution plan.
   – The data flow is the pull-based iterator model like traditional row-
     store database.
       • For each tuple, the root node calls “next()” on its children, who call “next”
         on their child recursively.
                                       External
                                        Merge
                                         Sort

                                            next()

                                       Hash Join

                            next()                 next()

                          Sequential                 Sequential
                            Scan                       Scan
Physical Operators
• We already implemented various row-based
  physical operators:
  –   Block nested Loop Join
  –   Hash join
  –   Merge join
  –   External sort
  –   In-memory sort
  –   Selection
  –   Projection
  –   Sort aggregation
  –   Hash aggregation
Planning & Optimization
• The following optimization was implemented:
  – Simplification of Algebraic expressions
  – Selection & projection push down optimization
  – Modification of the order of some physical
    operators
  – Cost-based Join ordering (still working, but almost
    done)
Storage System
• Provide split methods that divide an input data set into a
  number of fragments (similar to splits).
• Provide Scanner and Appender interfaces
   – Specialized to structured data
• Enable users to implement their own scanners or appenders.
   – If you can model unstructured data to structured data, the
     unstructured data is available in Tajo.
• Currently, it provides various row/columnar store file formats,
  such as CSVFile, RowFile, RCFile, and Trevni (still unstable).
   – They can be easily ported as follows:
       • https://github.com/tajo-project/tajo/blob/master/tajo-core/tajo-core-
         storage/src/main/java/tajo/storage/rcfile/RCFileWrapper.java
Repartitioning
• In distributed query execution, data repartition (like
  shuffle of MapReduce) is very important.
• Tajo has two kind of repartition methods:
   – Range repartition
      • Boosts the performance of sort operation
      • Also used to generate totally-ordered multiple files
   – Hash repartition
      • Without sort, Tajo can repartition the intermediate data to
        workers using the hash repartition.
Repartitioning: Range Repartition

       A-V              A-K




                               Each is
       D-Z             L-O     a disjoint
                               range.




       B-P              P-Z
An example of DQEP                                            Sort


                                                              Scan
                             Execution Blocks
                                                             Range
                                                             Fetch

            Sort
                                                             Range
                                                            Partition

           Inner                                              Sort
            Join
                                                             Inner
                                                              Join
    Scan            Scan
                                                    Scan                Scan

                                                   Hash              Hash
                                                   Fetch             Fetch


A DQEP is a DAG of execution blocks.              Hash                    Hash
                                                Partition               Partition

                                                 Scan                     Scan
Query Transformation
            Distributed Query Engine

                            Logical           Optimized
SQL         AST
                             Plan            Logical Plan



                                             Distributed
                                             Execution
                                                Plan
                    ....
                  Tasks



 Logical    • Fragment is similar to InputSplit of MapReduce.
   Plan         • Additionally, it includes Schema and Metadata.
 (JSON)
            • Fetch is a URI to be fetched.
Fragment
 & Fetch
            • According to DQEP, each execution block is
              localized and is transformed to a number of tasks.
A Task
QUERY EVALUATION
Distributed Sort
• Tajo provides the range repartition.
  – So, it easy to support the distributed sort.
• Usually, a distributed sort requires two phases.
  1. First Phase: Sort on splits
  2. Range Repartition
  3. Second Phase: Sort on repartitioned data
• Finally, the multiple and ordered output files
  are written to HDFS.
Aggregation
• Tajo provides the range/hash repartitions.
• Aggregation can use either repartition.
   – According to the sort order of input table or
     consecutive operations, we can use different
     repartition methods.
• Usually, it requires two phases, but various
  aggregation algorithms can be mixed.
   1. First Phase: Sort Aggregation (or Hash Aggregation)
   2. Hash Repartition
   3. Second Phase: Hash Aggregation (if intermediate
      data fits in memory)
Join
• It supports existing various join strategies used in
  shared-nothing databases (or Hive).
   – Broadcast Join
   – Repartition Join (hash and range)
• It also requires two phases and can mix various
  join algorithms.
   1. First Phase: Scan (and filter by selection pushdown)
   2. Hash (or range repartition)
   3. Hash Join (or Merge Join if range repartition is
      performed)
Join
• In addition, we provide a new join strategy if
  the larger table is sorted on a join key.
  1. A smaller table is repartitioned via range
     repartition.
  2. Assign the range partitions to nodes whose large
     table partitions correspond to the join key range.
  3. Each node performs the merge join.
EXPERIMENTS
Experiments
• We carried out the experiments in comparison with
  Hive on the top of MapReduce.
• The following results are based on the experiment that
  we carried out in August, 2012.
• Experimental Environments
   – 1TB TPC-H Data Set
      • Some of TPC-H Queries
   – 32 Cluster Nodes
      •   Intel i5
      •   16GB Memory
      •   1TB HDD x 2
      •   1G ether networks
   – Hadoop 0.20.2-cdh3u3
Tajo vs. Hive on TPC-H 1TB




    Experiment results submitted in ICDE 2013
           (Evaluated in August, 2012)
FUTURE WORKS
Future Works
• Cost-based Optimization
   – Cost-based Join Order Selection (almost done)
   – Progressive Optimization
       • During query processing, the optimizer will try to reoptimize the remain parts
         of the determined query plan.
• Column Store
   – Native Columnar Execution Engine
   – Optimizer for Columnar Execution Engine
   – Porting ORC File
• Low Latency
   – Online Aggregation
       • Enable users to get estimates of an aggregate query in an online fashion as
         soon as the query is issued
   – Intermediate data streaming (push-based transmission)
• More Mature Physical Operators for the Row-based Engine
Appendix - History
• August, 2012: Benchamark Test
   – Some TPC-H queries, TPC-H 1TB on 32 cluster nodes
   – Tajo outperforms about Hive 1.5-3 times
• August, 2012: Demo Paper was submitted to IEEE ICDE 2013.
• October, 2012: Tajo was accepted to IEEE ICDE 2013.
   • We will visit Brisbane in order to demonstrate Tajo in April 2013.
• January, 2013: Benchmark Test by Gruter
   – Gruter (A bigdata company in South Korea - http://www.gruter.com/)
   – TPC-H 1,3,and 6 queries, 100GB on 10 cluster nodes
   – Tajo outperforms Hive about 2-3 times, and is comparable to Cloudera
     Impala.

More Related Content

What's hot

Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
SANG WON PARK
 
ch10 Mass Storage Structure .ppt
ch10 Mass Storage Structure .pptch10 Mass Storage Structure .ppt
ch10 Mass Storage Structure .ppt
Mariam749277
 
Using Apache Hive with High Performance
Using Apache Hive with High PerformanceUsing Apache Hive with High Performance
Using Apache Hive with High Performance
Inderaj (Raj) Bains
 
VoltDB 소개
VoltDB 소개VoltDB 소개
VoltDB 소개
Linux Foundation Korea
 
HBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon 2015: HBase Performance Tuning @ SalesforceHBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon
 
How to use Impala query plan and profile to fix performance issues
How to use Impala query plan and profile to fix performance issuesHow to use Impala query plan and profile to fix performance issues
How to use Impala query plan and profile to fix performance issues
Cloudera, Inc.
 
The columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache ArrowThe columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache Arrow
Julien Le Dem
 
HBase at Bloomberg: High Availability Needs for the Financial Industry
HBase at Bloomberg: High Availability Needs for the Financial IndustryHBase at Bloomberg: High Availability Needs for the Financial Industry
HBase at Bloomberg: High Availability Needs for the Financial Industry
HBaseCon
 
Performance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark MetricsPerformance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark Metrics
Databricks
 
Tez Shuffle Handler: Shuffling at Scale with Apache Hadoop
Tez Shuffle Handler: Shuffling at Scale with Apache HadoopTez Shuffle Handler: Shuffling at Scale with Apache Hadoop
Tez Shuffle Handler: Shuffling at Scale with Apache Hadoop
DataWorks Summit
 
Hadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Hadoop World 2011: Advanced HBase Schema Design - Lars George, ClouderaHadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Hadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Cloudera, Inc.
 
OLAP for Big Data (Druid vs Apache Kylin vs Apache Lens)
OLAP for Big Data (Druid vs Apache Kylin vs Apache Lens)OLAP for Big Data (Druid vs Apache Kylin vs Apache Lens)
OLAP for Big Data (Druid vs Apache Kylin vs Apache Lens)
SANG WON PARK
 
NiFi Developer Guide
NiFi Developer GuideNiFi Developer Guide
NiFi Developer Guide
Deon Huang
 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
DataWorks Summit/Hadoop Summit
 
Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...
Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...
Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...
Amazon Web Services
 
HBase in Practice
HBase in PracticeHBase in Practice
HBase in Practice
larsgeorge
 
Kernel Recipes 2017: Using Linux perf at Netflix
Kernel Recipes 2017: Using Linux perf at NetflixKernel Recipes 2017: Using Linux perf at Netflix
Kernel Recipes 2017: Using Linux perf at Netflix
Brendan Gregg
 
Apache Tez – Present and Future
Apache Tez – Present and FutureApache Tez – Present and Future
Apache Tez – Present and Future
DataWorks Summit
 
Scaling Flink in Cloud
Scaling Flink in CloudScaling Flink in Cloud
Scaling Flink in Cloud
Steven Wu
 
Apache kafka performance(throughput) - without data loss and guaranteeing dat...
Apache kafka performance(throughput) - without data loss and guaranteeing dat...Apache kafka performance(throughput) - without data loss and guaranteeing dat...
Apache kafka performance(throughput) - without data loss and guaranteeing dat...
SANG WON PARK
 

What's hot (20)

Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
 
ch10 Mass Storage Structure .ppt
ch10 Mass Storage Structure .pptch10 Mass Storage Structure .ppt
ch10 Mass Storage Structure .ppt
 
Using Apache Hive with High Performance
Using Apache Hive with High PerformanceUsing Apache Hive with High Performance
Using Apache Hive with High Performance
 
VoltDB 소개
VoltDB 소개VoltDB 소개
VoltDB 소개
 
HBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon 2015: HBase Performance Tuning @ SalesforceHBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon 2015: HBase Performance Tuning @ Salesforce
 
How to use Impala query plan and profile to fix performance issues
How to use Impala query plan and profile to fix performance issuesHow to use Impala query plan and profile to fix performance issues
How to use Impala query plan and profile to fix performance issues
 
The columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache ArrowThe columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache Arrow
 
HBase at Bloomberg: High Availability Needs for the Financial Industry
HBase at Bloomberg: High Availability Needs for the Financial IndustryHBase at Bloomberg: High Availability Needs for the Financial Industry
HBase at Bloomberg: High Availability Needs for the Financial Industry
 
Performance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark MetricsPerformance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark Metrics
 
Tez Shuffle Handler: Shuffling at Scale with Apache Hadoop
Tez Shuffle Handler: Shuffling at Scale with Apache HadoopTez Shuffle Handler: Shuffling at Scale with Apache Hadoop
Tez Shuffle Handler: Shuffling at Scale with Apache Hadoop
 
Hadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Hadoop World 2011: Advanced HBase Schema Design - Lars George, ClouderaHadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Hadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
 
OLAP for Big Data (Druid vs Apache Kylin vs Apache Lens)
OLAP for Big Data (Druid vs Apache Kylin vs Apache Lens)OLAP for Big Data (Druid vs Apache Kylin vs Apache Lens)
OLAP for Big Data (Druid vs Apache Kylin vs Apache Lens)
 
NiFi Developer Guide
NiFi Developer GuideNiFi Developer Guide
NiFi Developer Guide
 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
 
Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...
Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...
Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...
 
HBase in Practice
HBase in PracticeHBase in Practice
HBase in Practice
 
Kernel Recipes 2017: Using Linux perf at Netflix
Kernel Recipes 2017: Using Linux perf at NetflixKernel Recipes 2017: Using Linux perf at Netflix
Kernel Recipes 2017: Using Linux perf at Netflix
 
Apache Tez – Present and Future
Apache Tez – Present and FutureApache Tez – Present and Future
Apache Tez – Present and Future
 
Scaling Flink in Cloud
Scaling Flink in CloudScaling Flink in Cloud
Scaling Flink in Cloud
 
Apache kafka performance(throughput) - without data loss and guaranteeing dat...
Apache kafka performance(throughput) - without data loss and guaranteeing dat...Apache kafka performance(throughput) - without data loss and guaranteeing dat...
Apache kafka performance(throughput) - without data loss and guaranteeing dat...
 

Viewers also liked

Tajo Seoul Meetup July 2015 - What's New Tajo 0.11
Tajo Seoul Meetup July 2015 - What's New Tajo 0.11Tajo Seoul Meetup July 2015 - What's New Tajo 0.11
Tajo Seoul Meetup July 2015 - What's New Tajo 0.11
Hyunsik Choi
 
ICDE2014 Session 14 Data Warehousing
ICDE2014 Session 14 Data WarehousingICDE2014 Session 14 Data Warehousing
ICDE2014 Session 14 Data Warehousing
Takuma Wakamori
 
Introduction to Apache Tajo: Data Warehouse for Big Data
Introduction to Apache Tajo: Data Warehouse for Big DataIntroduction to Apache Tajo: Data Warehouse for Big Data
Introduction to Apache Tajo: Data Warehouse for Big Data
Gruter
 
Apache Tajo on Swift: Bringing SQL to the OpenStack World
Apache Tajo on Swift: Bringing SQL to the OpenStack WorldApache Tajo on Swift: Bringing SQL to the OpenStack World
Apache Tajo on Swift: Bringing SQL to the OpenStack World
Jihoon Son
 
Hadoop Summit 2014: Query Optimization and JIT-based Vectorized Execution in ...
Hadoop Summit 2014: Query Optimization and JIT-based Vectorized Execution in ...Hadoop Summit 2014: Query Optimization and JIT-based Vectorized Execution in ...
Hadoop Summit 2014: Query Optimization and JIT-based Vectorized Execution in ...
Gruter
 
Data ware house
Data ware houseData ware house
Data ware house
Narayanreddy Sama
 
Hadoop Integration into Data Warehousing Architectures
Hadoop Integration into Data Warehousing ArchitecturesHadoop Integration into Data Warehousing Architectures
Hadoop Integration into Data Warehousing Architectures
Humza Naseer
 
Data ware housing- Introduction to data ware housing
Data ware housing- Introduction to data ware housingData ware housing- Introduction to data ware housing
Data ware housing- Introduction to data ware housing
Vibrant Technologies & Computers
 
Data-ware Housing
Data-ware HousingData-ware Housing
Data-ware Housing
Prof.Nilesh Magar
 
Seminar datawarehouse @ Universitas Multimedia Nusantara
Seminar datawarehouse @ Universitas Multimedia NusantaraSeminar datawarehouse @ Universitas Multimedia Nusantara
Seminar datawarehouse @ Universitas Multimedia Nusantara
Universitas Multimedia Nusantara
 
Hadoop and Enterprise Data Warehouse
Hadoop and Enterprise Data WarehouseHadoop and Enterprise Data Warehouse
Hadoop and Enterprise Data Warehouse
DataWorks Summit
 
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!
Caserta
 
Best Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop Professionals
Best Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop ProfessionalsBest Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop Professionals
Best Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop Professionals
Cloudera, Inc.
 
e commerce security and fraud protection
e commerce security and fraud protectione commerce security and fraud protection
e commerce security and fraud protection
tumetr1
 
Marketing and advertising in e commerce
Marketing and advertising in e commerceMarketing and advertising in e commerce
Marketing and advertising in e commerce
tumetr1
 
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 MillionHow One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million
DataWorks Summit
 

Viewers also liked (16)

Tajo Seoul Meetup July 2015 - What's New Tajo 0.11
Tajo Seoul Meetup July 2015 - What's New Tajo 0.11Tajo Seoul Meetup July 2015 - What's New Tajo 0.11
Tajo Seoul Meetup July 2015 - What's New Tajo 0.11
 
ICDE2014 Session 14 Data Warehousing
ICDE2014 Session 14 Data WarehousingICDE2014 Session 14 Data Warehousing
ICDE2014 Session 14 Data Warehousing
 
Introduction to Apache Tajo: Data Warehouse for Big Data
Introduction to Apache Tajo: Data Warehouse for Big DataIntroduction to Apache Tajo: Data Warehouse for Big Data
Introduction to Apache Tajo: Data Warehouse for Big Data
 
Apache Tajo on Swift: Bringing SQL to the OpenStack World
Apache Tajo on Swift: Bringing SQL to the OpenStack WorldApache Tajo on Swift: Bringing SQL to the OpenStack World
Apache Tajo on Swift: Bringing SQL to the OpenStack World
 
Hadoop Summit 2014: Query Optimization and JIT-based Vectorized Execution in ...
Hadoop Summit 2014: Query Optimization and JIT-based Vectorized Execution in ...Hadoop Summit 2014: Query Optimization and JIT-based Vectorized Execution in ...
Hadoop Summit 2014: Query Optimization and JIT-based Vectorized Execution in ...
 
Data ware house
Data ware houseData ware house
Data ware house
 
Hadoop Integration into Data Warehousing Architectures
Hadoop Integration into Data Warehousing ArchitecturesHadoop Integration into Data Warehousing Architectures
Hadoop Integration into Data Warehousing Architectures
 
Data ware housing- Introduction to data ware housing
Data ware housing- Introduction to data ware housingData ware housing- Introduction to data ware housing
Data ware housing- Introduction to data ware housing
 
Data-ware Housing
Data-ware HousingData-ware Housing
Data-ware Housing
 
Seminar datawarehouse @ Universitas Multimedia Nusantara
Seminar datawarehouse @ Universitas Multimedia NusantaraSeminar datawarehouse @ Universitas Multimedia Nusantara
Seminar datawarehouse @ Universitas Multimedia Nusantara
 
Hadoop and Enterprise Data Warehouse
Hadoop and Enterprise Data WarehouseHadoop and Enterprise Data Warehouse
Hadoop and Enterprise Data Warehouse
 
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!
 
Best Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop Professionals
Best Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop ProfessionalsBest Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop Professionals
Best Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop Professionals
 
e commerce security and fraud protection
e commerce security and fraud protectione commerce security and fraud protection
e commerce security and fraud protection
 
Marketing and advertising in e commerce
Marketing and advertising in e commerceMarketing and advertising in e commerce
Marketing and advertising in e commerce
 
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 MillionHow One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million
 

Similar to Tajo: A Distributed Data Warehouse System for Hadoop

Tajo_Meetup_20141120
Tajo_Meetup_20141120Tajo_Meetup_20141120
Tajo_Meetup_20141120
Hyoungjun Kim
 
2012 sept 18_thug_biotech
2012 sept 18_thug_biotech2012 sept 18_thug_biotech
2012 sept 18_thug_biotech
Adam Muise
 
HPTS talk on micro-sharding with Katta
HPTS talk on micro-sharding with KattaHPTS talk on micro-sharding with Katta
HPTS talk on micro-sharding with Katta
Ted Dunning
 
Spark SQL In Depth www.syedacademy.com
Spark SQL In Depth www.syedacademy.comSpark SQL In Depth www.syedacademy.com
Spark SQL In Depth www.syedacademy.com
Syed Hadoop
 
Advanced databases -client /server arch
Advanced databases -client /server archAdvanced databases -client /server arch
Advanced databases -client /server arch
Aravindharamanan S
 
Advancedrn
AdvancedrnAdvancedrn
Advancedrn
Aravindharamanan S
 
xPatterns ... beyond Hadoop (Spark, Shark, Mesos, Tachyon)
xPatterns ... beyond Hadoop (Spark, Shark, Mesos, Tachyon)xPatterns ... beyond Hadoop (Spark, Shark, Mesos, Tachyon)
xPatterns ... beyond Hadoop (Spark, Shark, Mesos, Tachyon)
Claudiu Barbura
 
Talkbits service architecture and deployment
Talkbits service architecture and deploymentTalkbits service architecture and deployment
Talkbits service architecture and deployment
Open-IT
 
Feb 2013 HUG: A Visual Workbench for Big Data Analytics on Hadoop
Feb 2013 HUG: A Visual Workbench for Big Data Analytics on HadoopFeb 2013 HUG: A Visual Workbench for Big Data Analytics on Hadoop
Feb 2013 HUG: A Visual Workbench for Big Data Analytics on Hadoop
Yahoo Developer Network
 
[FOSS4G 2015 SEOUL] Spatial tajo supporting spatial queries on Apache Tajo
[FOSS4G 2015 SEOUL] Spatial tajo supporting spatial queries on Apache Tajo[FOSS4G 2015 SEOUL] Spatial tajo supporting spatial queries on Apache Tajo
[FOSS4G 2015 SEOUL] Spatial tajo supporting spatial queries on Apache Tajo
BD
 
Cheetah:Data Warehouse on Top of MapReduce
Cheetah:Data Warehouse on Top of MapReduceCheetah:Data Warehouse on Top of MapReduce
Cheetah:Data Warehouse on Top of MapReduce
Tilani Gunawardena PhD(UNIBAS), BSc(Pera), FHEA(UK), CEng, MIESL
 
Reducing Cache Misses in Hash Join Probing Phase by Pre-sorting Strategy
Reducing Cache Misses in Hash Join Probing Phase by Pre-sorting StrategyReducing Cache Misses in Hash Join Probing Phase by Pre-sorting Strategy
Reducing Cache Misses in Hash Join Probing Phase by Pre-sorting Strategy
Jaemyung Kim
 
2015 01-17 Lambda Architecture with Apache Spark, NextML Conference
2015 01-17 Lambda Architecture with Apache Spark, NextML Conference2015 01-17 Lambda Architecture with Apache Spark, NextML Conference
2015 01-17 Lambda Architecture with Apache Spark, NextML Conference
DB Tsai
 
Oct 2012 HUG: Project Panthera: Better Analytics with SQL, MapReduce, and HBase
Oct 2012 HUG: Project Panthera: Better Analytics with SQL, MapReduce, and HBaseOct 2012 HUG: Project Panthera: Better Analytics with SQL, MapReduce, and HBase
Oct 2012 HUG: Project Panthera: Better Analytics with SQL, MapReduce, and HBase
Yahoo Developer Network
 
BDA R20 21NM - Summary Big Data Analytics
BDA R20 21NM - Summary Big Data AnalyticsBDA R20 21NM - Summary Big Data Analytics
BDA R20 21NM - Summary Big Data Analytics
NetajiGandi1
 
What's new in JBoss ON 3.2
What's new in JBoss ON 3.2What's new in JBoss ON 3.2
What's new in JBoss ON 3.2
Thomas Segismont
 
ASPLOS2011 workshop RESoLVE "Effect of Disk Prefetching of Guest OS "
ASPLOS2011 workshop RESoLVE "Effect of Disk Prefetching of Guest OS "ASPLOS2011 workshop RESoLVE "Effect of Disk Prefetching of Guest OS "
ASPLOS2011 workshop RESoLVE "Effect of Disk Prefetching of Guest OS "
Kuniyasu Suzaki
 
xPatterns on Spark, Shark, Mesos, Tachyon
xPatterns on Spark, Shark, Mesos, TachyonxPatterns on Spark, Shark, Mesos, Tachyon
xPatterns on Spark, Shark, Mesos, Tachyon
Claudiu Barbura
 
Lambda usecase
Lambda usecaseLambda usecase
Lambda usecase
David Tung
 
Apache Drill talk ApacheCon 2018
Apache Drill talk ApacheCon 2018Apache Drill talk ApacheCon 2018
Apache Drill talk ApacheCon 2018
Aman Sinha
 

Similar to Tajo: A Distributed Data Warehouse System for Hadoop (20)

Tajo_Meetup_20141120
Tajo_Meetup_20141120Tajo_Meetup_20141120
Tajo_Meetup_20141120
 
2012 sept 18_thug_biotech
2012 sept 18_thug_biotech2012 sept 18_thug_biotech
2012 sept 18_thug_biotech
 
HPTS talk on micro-sharding with Katta
HPTS talk on micro-sharding with KattaHPTS talk on micro-sharding with Katta
HPTS talk on micro-sharding with Katta
 
Spark SQL In Depth www.syedacademy.com
Spark SQL In Depth www.syedacademy.comSpark SQL In Depth www.syedacademy.com
Spark SQL In Depth www.syedacademy.com
 
Advanced databases -client /server arch
Advanced databases -client /server archAdvanced databases -client /server arch
Advanced databases -client /server arch
 
Advancedrn
AdvancedrnAdvancedrn
Advancedrn
 
xPatterns ... beyond Hadoop (Spark, Shark, Mesos, Tachyon)
xPatterns ... beyond Hadoop (Spark, Shark, Mesos, Tachyon)xPatterns ... beyond Hadoop (Spark, Shark, Mesos, Tachyon)
xPatterns ... beyond Hadoop (Spark, Shark, Mesos, Tachyon)
 
Talkbits service architecture and deployment
Talkbits service architecture and deploymentTalkbits service architecture and deployment
Talkbits service architecture and deployment
 
Feb 2013 HUG: A Visual Workbench for Big Data Analytics on Hadoop
Feb 2013 HUG: A Visual Workbench for Big Data Analytics on HadoopFeb 2013 HUG: A Visual Workbench for Big Data Analytics on Hadoop
Feb 2013 HUG: A Visual Workbench for Big Data Analytics on Hadoop
 
[FOSS4G 2015 SEOUL] Spatial tajo supporting spatial queries on Apache Tajo
[FOSS4G 2015 SEOUL] Spatial tajo supporting spatial queries on Apache Tajo[FOSS4G 2015 SEOUL] Spatial tajo supporting spatial queries on Apache Tajo
[FOSS4G 2015 SEOUL] Spatial tajo supporting spatial queries on Apache Tajo
 
Cheetah:Data Warehouse on Top of MapReduce
Cheetah:Data Warehouse on Top of MapReduceCheetah:Data Warehouse on Top of MapReduce
Cheetah:Data Warehouse on Top of MapReduce
 
Reducing Cache Misses in Hash Join Probing Phase by Pre-sorting Strategy
Reducing Cache Misses in Hash Join Probing Phase by Pre-sorting StrategyReducing Cache Misses in Hash Join Probing Phase by Pre-sorting Strategy
Reducing Cache Misses in Hash Join Probing Phase by Pre-sorting Strategy
 
2015 01-17 Lambda Architecture with Apache Spark, NextML Conference
2015 01-17 Lambda Architecture with Apache Spark, NextML Conference2015 01-17 Lambda Architecture with Apache Spark, NextML Conference
2015 01-17 Lambda Architecture with Apache Spark, NextML Conference
 
Oct 2012 HUG: Project Panthera: Better Analytics with SQL, MapReduce, and HBase
Oct 2012 HUG: Project Panthera: Better Analytics with SQL, MapReduce, and HBaseOct 2012 HUG: Project Panthera: Better Analytics with SQL, MapReduce, and HBase
Oct 2012 HUG: Project Panthera: Better Analytics with SQL, MapReduce, and HBase
 
BDA R20 21NM - Summary Big Data Analytics
BDA R20 21NM - Summary Big Data AnalyticsBDA R20 21NM - Summary Big Data Analytics
BDA R20 21NM - Summary Big Data Analytics
 
What's new in JBoss ON 3.2
What's new in JBoss ON 3.2What's new in JBoss ON 3.2
What's new in JBoss ON 3.2
 
ASPLOS2011 workshop RESoLVE "Effect of Disk Prefetching of Guest OS "
ASPLOS2011 workshop RESoLVE "Effect of Disk Prefetching of Guest OS "ASPLOS2011 workshop RESoLVE "Effect of Disk Prefetching of Guest OS "
ASPLOS2011 workshop RESoLVE "Effect of Disk Prefetching of Guest OS "
 
xPatterns on Spark, Shark, Mesos, Tachyon
xPatterns on Spark, Shark, Mesos, TachyonxPatterns on Spark, Shark, Mesos, Tachyon
xPatterns on Spark, Shark, Mesos, Tachyon
 
Lambda usecase
Lambda usecaseLambda usecase
Lambda usecase
 
Apache Drill talk ApacheCon 2018
Apache Drill talk ApacheCon 2018Apache Drill talk ApacheCon 2018
Apache Drill talk ApacheCon 2018
 

Tajo: A Distributed Data Warehouse System for Hadoop

  • 1. Introduction to Tajo (Tajo: A Distributed Data Warehouse System for Hadoop) Hyunsik Choi (hyunsik at apache dot org)
  • 2. Introduction • Tajo is a distributed warehouse system for Hadoop. • Tajo is designed for low-latency and scalable ETL and ad-hoc queries on large-data sets by leveraging advanced database techniques. • Support SQL standards • Tajo has its own query engine and uses HDFS as a primary storage layer. – Direct control of distributed execution and data flow – A variety of query evaluation strategies – More optimization opportunities • Row/Native columnar execution engine and optimizer • An alternative choice to Hive/Pig on the top of MapReduce
  • 3. Design Principles • Scalability • Low Latency • Fault Tolerance • Application of Advanced Database Techniques
  • 4. Overall Architecture • The architecture of Tajo follows the master-worker model. – TajoMaster • A dedicated server for providing client service and coordinating QueryMasters – For each query , one QueryMaster and many task runners work together. • TaskRunner includes a local query engine that executes a directed acyclic graph (DAG) of physical operators. • Tajo employs Hadoop Yarn as a resource manager for large clusters.
  • 5. Overall Architecture TaskRunner Tajo Master Client Parser Planner Query Master Client Catalog Optimizer Query Node Client Manager Managers Query Master Resource Manager of Yarn
  • 6. Data Structure & Terminology • Logical Plan – A representation of relational algebra – Implemented in a graph of java objects – (de) serialized in a JSON document via GSON • Physical Execution Plan – A directed acyclic graph of physical operators • Execution Block – A subquery executed across a number of cluster nodes. • Distributed Query Execution Plan (DQEP) – A directed acyclic graph of execution blocks.
  • 7. Tajo Master • It provides a client service that enables users to submit to queries and to ask catalog information. • It coordinates a number of QueryMasters. • It provides a catalog service. – Catalog Server is embedded in Tajo Master. – Catalog Server • Maintains the table and index descriptions • Supports some statistics (min, max value, num of rows, and bytes) • Uses Apache Derby as the persistent storage via JDBC.
  • 8. Query Master • Query Master is responsible for – controlling a distributed execution plan (i.e., a DAG of execution blocks) – launching TaskRunner containers – coordinating TaskRunners – localizing tasks – scheduling tasks to TaskRunners • Also, It monitors running tasks, collects the statistics, reports them to TajoMaster.
  • 9. TaskRunner • It contains a local query engine that actually executes tasks. • A TaskRunner is launched by NodeManager of Hadoop Yarn. • It is reused within a execution block. – According to the workload of the execution block, different resources are assigned to containers.
  • 10. TaskRunner • In TaskRunner, a physical execution plan is completed according to the container’s resource or the size of input data. – It allows a TaskRunner to choose more efficient execution plans and to maximize the utilization of hardware capacity. – This feature is useful for heterogeneous environments.
  • 11. Local Query Engine • Now, Tajo provides a row-based execution engine. – It executes a physical execution plan. – The data flow is the pull-based iterator model like traditional row- store database. • For each tuple, the root node calls “next()” on its children, who call “next” on their child recursively. External Merge Sort next() Hash Join next() next() Sequential Sequential Scan Scan
  • 12. Physical Operators • We already implemented various row-based physical operators: – Block nested Loop Join – Hash join – Merge join – External sort – In-memory sort – Selection – Projection – Sort aggregation – Hash aggregation
  • 13. Planning & Optimization • The following optimization was implemented: – Simplification of Algebraic expressions – Selection & projection push down optimization – Modification of the order of some physical operators – Cost-based Join ordering (still working, but almost done)
  • 14. Storage System • Provide split methods that divide an input data set into a number of fragments (similar to splits). • Provide Scanner and Appender interfaces – Specialized to structured data • Enable users to implement their own scanners or appenders. – If you can model unstructured data to structured data, the unstructured data is available in Tajo. • Currently, it provides various row/columnar store file formats, such as CSVFile, RowFile, RCFile, and Trevni (still unstable). – They can be easily ported as follows: • https://github.com/tajo-project/tajo/blob/master/tajo-core/tajo-core- storage/src/main/java/tajo/storage/rcfile/RCFileWrapper.java
  • 15. Repartitioning • In distributed query execution, data repartition (like shuffle of MapReduce) is very important. • Tajo has two kind of repartition methods: – Range repartition • Boosts the performance of sort operation • Also used to generate totally-ordered multiple files – Hash repartition • Without sort, Tajo can repartition the intermediate data to workers using the hash repartition.
  • 16. Repartitioning: Range Repartition A-V A-K Each is D-Z L-O a disjoint range. B-P P-Z
  • 17. An example of DQEP Sort Scan Execution Blocks Range Fetch Sort Range Partition Inner Sort Join Inner Join Scan Scan Scan Scan Hash Hash Fetch Fetch A DQEP is a DAG of execution blocks. Hash Hash Partition Partition Scan Scan
  • 18. Query Transformation Distributed Query Engine Logical Optimized SQL AST Plan Logical Plan Distributed Execution Plan .... Tasks Logical • Fragment is similar to InputSplit of MapReduce. Plan • Additionally, it includes Schema and Metadata. (JSON) • Fetch is a URI to be fetched. Fragment & Fetch • According to DQEP, each execution block is localized and is transformed to a number of tasks. A Task
  • 20. Distributed Sort • Tajo provides the range repartition. – So, it easy to support the distributed sort. • Usually, a distributed sort requires two phases. 1. First Phase: Sort on splits 2. Range Repartition 3. Second Phase: Sort on repartitioned data • Finally, the multiple and ordered output files are written to HDFS.
  • 21. Aggregation • Tajo provides the range/hash repartitions. • Aggregation can use either repartition. – According to the sort order of input table or consecutive operations, we can use different repartition methods. • Usually, it requires two phases, but various aggregation algorithms can be mixed. 1. First Phase: Sort Aggregation (or Hash Aggregation) 2. Hash Repartition 3. Second Phase: Hash Aggregation (if intermediate data fits in memory)
  • 22. Join • It supports existing various join strategies used in shared-nothing databases (or Hive). – Broadcast Join – Repartition Join (hash and range) • It also requires two phases and can mix various join algorithms. 1. First Phase: Scan (and filter by selection pushdown) 2. Hash (or range repartition) 3. Hash Join (or Merge Join if range repartition is performed)
  • 23. Join • In addition, we provide a new join strategy if the larger table is sorted on a join key. 1. A smaller table is repartitioned via range repartition. 2. Assign the range partitions to nodes whose large table partitions correspond to the join key range. 3. Each node performs the merge join.
  • 25. Experiments • We carried out the experiments in comparison with Hive on the top of MapReduce. • The following results are based on the experiment that we carried out in August, 2012. • Experimental Environments – 1TB TPC-H Data Set • Some of TPC-H Queries – 32 Cluster Nodes • Intel i5 • 16GB Memory • 1TB HDD x 2 • 1G ether networks – Hadoop 0.20.2-cdh3u3
  • 26. Tajo vs. Hive on TPC-H 1TB Experiment results submitted in ICDE 2013 (Evaluated in August, 2012)
  • 28. Future Works • Cost-based Optimization – Cost-based Join Order Selection (almost done) – Progressive Optimization • During query processing, the optimizer will try to reoptimize the remain parts of the determined query plan. • Column Store – Native Columnar Execution Engine – Optimizer for Columnar Execution Engine – Porting ORC File • Low Latency – Online Aggregation • Enable users to get estimates of an aggregate query in an online fashion as soon as the query is issued – Intermediate data streaming (push-based transmission) • More Mature Physical Operators for the Row-based Engine
  • 29. Appendix - History • August, 2012: Benchamark Test – Some TPC-H queries, TPC-H 1TB on 32 cluster nodes – Tajo outperforms about Hive 1.5-3 times • August, 2012: Demo Paper was submitted to IEEE ICDE 2013. • October, 2012: Tajo was accepted to IEEE ICDE 2013. • We will visit Brisbane in order to demonstrate Tajo in April 2013. • January, 2013: Benchmark Test by Gruter – Gruter (A bigdata company in South Korea - http://www.gruter.com/) – TPC-H 1,3,and 6 queries, 100GB on 10 cluster nodes – Tajo outperforms Hive about 2-3 times, and is comparable to Cloudera Impala.