SlideShare a Scribd company logo
A Benchmark & Comparison
on Hive (2.1.0) LLAP and
Presto (0.208)
Data Engineering@SmartNews
Yu Liu 2018/10/01
Background and Preliminaries
TPC-DS and TPC-H benchmark suits
Hive LLAP (link)
Similar results from Hortonworks and Microsoft
Conclusions for Impatience
❏ Hive LLAP brings significant improvements on performance
❏ Hive LLAP is outperformed on TPC-DS, compared with non-LLAP and Presto
❏ Presto shows some advantages on TPC-H
❏ Hive LLAP causes bigger footprints of RAM usage and need careful tuning
Environment Setting
Cluster configurations (AWS EC2):
❏ 1 X Master : r4.xlarge (4 vCPU - 32GB RAM)
❏ 10 X Workers : i3.large (2 vCPU - 16GB RAM)
Stacks:
❏ HDP 2.6 (Hive 2.1.0, Tez 0.7.0, Calcite 1.2.0)
❏ Presto 0.208
TPC Data Sets
❏ 10 GB Text/ORC format for both DS and H
Results and Comparisons
The way of calculating query performance in this table including “query-duration” and “job-submission” time.
Dive into Details
In total 99 queries
Hive LLAP Presto Comparison (H v P)
Faster cases (num) 53 17 3.1 times
Total Runtime (s) 1351 2058 65.6% (1.5 times
faster)
Failed cases (num) 0 21 (OOM or syntax
error)
Due to the time constraint, the benchmark used a same set of SQLs for Hive 2
Biggest Differences
Query No. Presto (sec.) Hive LLAP (sec.) Difference ratio
query39 188 29 6.48
query47 56 13 4.31
query9 28 7 4
query44 8 30 3.75
query22 48 161 3.35
query88 50 15 3.33
Compare with Current EMR(5) Installation
Conditions:
➔ Input Data: same (24 tables,10 GB text and converted to ORC format)
➔ TPC-DS queries on ORC tables
➔ EMR 5 instances spec. are much better (worker nodes = 16 vCPU, 32GB RAM x 10)
Results: tens of times faster than EMR5 (details in next slide)
Reasons:
1. LLAP setting
2. ORC file format issue
3. Other Hive configurations
TPC-DS Queries Duration Time
Query No. Run on EMR(5) in sec. Run on Hive LLAP in sec. Difference (times)
No. 1 130.717 1.855 70
No. 11 184.94 7.434 25
No. 12 31.565 1.384 23
No. 50 90.435 11.576 8
No. 66 140.1 3.742 38
The way of calculating query performance in this table only including “query-duration” time.
Many other details are omitted
The following details are omitted because of space, in general and obviously they
are slower
❏ Hive with LLAP on MR engine
❏ Hive with Tez without LLAP
Future work
❏ Hive LLAP with Druid as the storage engine
❏ Compare with EMR 5.0 installation (with some constraint on supporting ORC)
❏ Parquet vs ORC
❏ ORC with different compaction strategies (BI/ETL/Hybrid)
❏ Evaluation on creations of bloom filter and zone map

More Related Content

What's hot

From Device to Data Center to Insights
From Device to Data Center to InsightsFrom Device to Data Center to Insights
From Device to Data Center to Insights
DataWorks Summit/Hadoop Summit
 
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in HiveLLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
DataWorks Summit/Hadoop Summit
 
Apache Hive ACID Project
Apache Hive ACID ProjectApache Hive ACID Project
Apache Hive ACID Project
DataWorks Summit/Hadoop Summit
 
LLAP Nov Meetup
LLAP Nov MeetupLLAP Nov Meetup
LLAP Nov Meetup
t3rmin4t0r
 
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBaseApache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
DataWorks Summit/Hadoop Summit
 
Apache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, ScaleApache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, Scale
DataWorks Summit/Hadoop Summit
 
Practice of large Hadoop cluster in China Mobile
Practice of large Hadoop cluster in China MobilePractice of large Hadoop cluster in China Mobile
Practice of large Hadoop cluster in China Mobile
DataWorks Summit
 
Hive acid and_2.x new_features
Hive acid and_2.x new_featuresHive acid and_2.x new_features
Hive acid and_2.x new_features
Alberto Romero
 
Evolving HDFS to Generalized Storage Subsystem
Evolving HDFS to Generalized Storage SubsystemEvolving HDFS to Generalized Storage Subsystem
Evolving HDFS to Generalized Storage Subsystem
DataWorks Summit/Hadoop Summit
 
Tune up Yarn and Hive
Tune up Yarn and HiveTune up Yarn and Hive
Tune up Yarn and Hive
rxu
 
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBaseApache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
DataWorks Summit/Hadoop Summit
 
LLAP: Building Cloud First BI
LLAP: Building Cloud First BILLAP: Building Cloud First BI
LLAP: Building Cloud First BI
DataWorks Summit
 
Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017
alanfgates
 
Hive on spark is blazing fast or is it final
Hive on spark is blazing fast or is it finalHive on spark is blazing fast or is it final
Hive on spark is blazing fast or is it final
Hortonworks
 
High throughput data replication over RAFT
High throughput data replication over RAFTHigh throughput data replication over RAFT
High throughput data replication over RAFT
DataWorks Summit
 
Kafka to Hadoop Ingest with Parsing, Dedup and other Big Data Transformations
Kafka to Hadoop Ingest with Parsing, Dedup and other Big Data TransformationsKafka to Hadoop Ingest with Parsing, Dedup and other Big Data Transformations
Kafka to Hadoop Ingest with Parsing, Dedup and other Big Data Transformations
Apache Apex
 
Comparative Performance Analysis of AWS EC2 Instance Types Commonly Used for ...
Comparative Performance Analysis of AWS EC2 Instance Types Commonly Used for ...Comparative Performance Analysis of AWS EC2 Instance Types Commonly Used for ...
Comparative Performance Analysis of AWS EC2 Instance Types Commonly Used for ...
DataWorks Summit
 
Hive & HBase for Transaction Processing Hadoop Summit EU Apr 2015
Hive & HBase for Transaction Processing Hadoop Summit EU Apr 2015Hive & HBase for Transaction Processing Hadoop Summit EU Apr 2015
Hive & HBase for Transaction Processing Hadoop Summit EU Apr 2015
alanfgates
 
Sub-second-sql-on-hadoop-at-scale
Sub-second-sql-on-hadoop-at-scaleSub-second-sql-on-hadoop-at-scale
Sub-second-sql-on-hadoop-at-scale
Yifeng Jiang
 
Migrating pipelines into Docker
Migrating pipelines into DockerMigrating pipelines into Docker
Migrating pipelines into Docker
DataWorks Summit/Hadoop Summit
 

What's hot (20)

From Device to Data Center to Insights
From Device to Data Center to InsightsFrom Device to Data Center to Insights
From Device to Data Center to Insights
 
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in HiveLLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
 
Apache Hive ACID Project
Apache Hive ACID ProjectApache Hive ACID Project
Apache Hive ACID Project
 
LLAP Nov Meetup
LLAP Nov MeetupLLAP Nov Meetup
LLAP Nov Meetup
 
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBaseApache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
 
Apache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, ScaleApache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, Scale
 
Practice of large Hadoop cluster in China Mobile
Practice of large Hadoop cluster in China MobilePractice of large Hadoop cluster in China Mobile
Practice of large Hadoop cluster in China Mobile
 
Hive acid and_2.x new_features
Hive acid and_2.x new_featuresHive acid and_2.x new_features
Hive acid and_2.x new_features
 
Evolving HDFS to Generalized Storage Subsystem
Evolving HDFS to Generalized Storage SubsystemEvolving HDFS to Generalized Storage Subsystem
Evolving HDFS to Generalized Storage Subsystem
 
Tune up Yarn and Hive
Tune up Yarn and HiveTune up Yarn and Hive
Tune up Yarn and Hive
 
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBaseApache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
 
LLAP: Building Cloud First BI
LLAP: Building Cloud First BILLAP: Building Cloud First BI
LLAP: Building Cloud First BI
 
Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017
 
Hive on spark is blazing fast or is it final
Hive on spark is blazing fast or is it finalHive on spark is blazing fast or is it final
Hive on spark is blazing fast or is it final
 
High throughput data replication over RAFT
High throughput data replication over RAFTHigh throughput data replication over RAFT
High throughput data replication over RAFT
 
Kafka to Hadoop Ingest with Parsing, Dedup and other Big Data Transformations
Kafka to Hadoop Ingest with Parsing, Dedup and other Big Data TransformationsKafka to Hadoop Ingest with Parsing, Dedup and other Big Data Transformations
Kafka to Hadoop Ingest with Parsing, Dedup and other Big Data Transformations
 
Comparative Performance Analysis of AWS EC2 Instance Types Commonly Used for ...
Comparative Performance Analysis of AWS EC2 Instance Types Commonly Used for ...Comparative Performance Analysis of AWS EC2 Instance Types Commonly Used for ...
Comparative Performance Analysis of AWS EC2 Instance Types Commonly Used for ...
 
Hive & HBase for Transaction Processing Hadoop Summit EU Apr 2015
Hive & HBase for Transaction Processing Hadoop Summit EU Apr 2015Hive & HBase for Transaction Processing Hadoop Summit EU Apr 2015
Hive & HBase for Transaction Processing Hadoop Summit EU Apr 2015
 
Sub-second-sql-on-hadoop-at-scale
Sub-second-sql-on-hadoop-at-scaleSub-second-sql-on-hadoop-at-scale
Sub-second-sql-on-hadoop-at-scale
 
Migrating pipelines into Docker
Migrating pipelines into DockerMigrating pipelines into Docker
Migrating pipelines into Docker
 

Similar to A TPC Benchmark of Hive LLAP and Comparison with Presto

Sql server 2016 it just runs faster sql bits 2017 edition
Sql server 2016 it just runs faster   sql bits 2017 editionSql server 2016 it just runs faster   sql bits 2017 edition
Sql server 2016 it just runs faster sql bits 2017 edition
Bob Ward
 
Unlock Bigdata Analytic Efficiency with Ceph Data Lake - Zhang Jian, Fu Yong
Unlock Bigdata Analytic Efficiency with Ceph Data Lake - Zhang Jian, Fu YongUnlock Bigdata Analytic Efficiency with Ceph Data Lake - Zhang Jian, Fu Yong
Unlock Bigdata Analytic Efficiency with Ceph Data Lake - Zhang Jian, Fu Yong
Ceph Community
 
Hadoop World 2011: Hadoop Network and Compute Architecture Considerations - J...
Hadoop World 2011: Hadoop Network and Compute Architecture Considerations - J...Hadoop World 2011: Hadoop Network and Compute Architecture Considerations - J...
Hadoop World 2011: Hadoop Network and Compute Architecture Considerations - J...
Cloudera, Inc.
 
SQL Server It Just Runs Faster
SQL Server It Just Runs FasterSQL Server It Just Runs Faster
SQL Server It Just Runs Faster
Bob Ward
 
The state of SQL-on-Hadoop in the Cloud
The state of SQL-on-Hadoop in the CloudThe state of SQL-on-Hadoop in the Cloud
The state of SQL-on-Hadoop in the Cloud
Nicolas Poggi
 
The state of SQL-on-Hadoop in the Cloud
The state of SQL-on-Hadoop in the CloudThe state of SQL-on-Hadoop in the Cloud
The state of SQL-on-Hadoop in the Cloud
DataWorks Summit/Hadoop Summit
 
Ceph Performance Profiling and Reporting
Ceph Performance Profiling and ReportingCeph Performance Profiling and Reporting
Ceph Performance Profiling and Reporting
Ceph Community
 
[ACNA2022] Hadoop Vectored IO_ your data just got faster!.pdf
[ACNA2022] Hadoop Vectored IO_ your data just got faster!.pdf[ACNA2022] Hadoop Vectored IO_ your data just got faster!.pdf
[ACNA2022] Hadoop Vectored IO_ your data just got faster!.pdf
MukundThakur22
 
20140120 presto meetup_en
20140120 presto meetup_en20140120 presto meetup_en
20140120 presto meetup_en
Ogibayashi
 
The state of Hive and Spark in the Cloud (July 2017)
The state of Hive and Spark in the Cloud (July 2017)The state of Hive and Spark in the Cloud (July 2017)
The state of Hive and Spark in the Cloud (July 2017)
Nicolas Poggi
 
Performance Optimizations in Apache Impala
Performance Optimizations in Apache ImpalaPerformance Optimizations in Apache Impala
Performance Optimizations in Apache Impala
Cloudera, Inc.
 
Super Computer
Super ComputerSuper Computer
Super Computer
gueste3bbd0
 
Using Derivation-Free Optimization in the Hadoop Cluster with Terasort
Using Derivation-Free Optimization in the Hadoop Cluster  with TerasortUsing Derivation-Free Optimization in the Hadoop Cluster  with Terasort
Using Derivation-Free Optimization in the Hadoop Cluster with Terasort
Anhanguera Educacional S/A
 
How Many Slaves (Ukoug)
How Many Slaves (Ukoug)How Many Slaves (Ukoug)
How Many Slaves (Ukoug)
Doug Burns
 
PostgreSQL 10: What to Look For
PostgreSQL 10: What to Look ForPostgreSQL 10: What to Look For
PostgreSQL 10: What to Look For
Amit Langote
 
Hadoop Hardware @Twitter: Size does matter!
Hadoop Hardware @Twitter: Size does matter!Hadoop Hardware @Twitter: Size does matter!
Hadoop Hardware @Twitter: Size does matter!
DataWorks Summit
 
Deep Learning with Apache Spark and GPUs with Pierce Spitler
Deep Learning with Apache Spark and GPUs with Pierce SpitlerDeep Learning with Apache Spark and GPUs with Pierce Spitler
Deep Learning with Apache Spark and GPUs with Pierce Spitler
Databricks
 
Apache Spark AI Use Case in Telco: Network Quality Analysis and Prediction wi...
Apache Spark AI Use Case in Telco: Network Quality Analysis and Prediction wi...Apache Spark AI Use Case in Telco: Network Quality Analysis and Prediction wi...
Apache Spark AI Use Case in Telco: Network Quality Analysis and Prediction wi...
Databricks
 
Linux Kernel vs DPDK: HTTP Performance Showdown
Linux Kernel vs DPDK: HTTP Performance ShowdownLinux Kernel vs DPDK: HTTP Performance Showdown
Linux Kernel vs DPDK: HTTP Performance Showdown
ScyllaDB
 
WMS Performance Shootout 2009
WMS Performance Shootout 2009WMS Performance Shootout 2009
WMS Performance Shootout 2009
Jeff McKenna
 

Similar to A TPC Benchmark of Hive LLAP and Comparison with Presto (20)

Sql server 2016 it just runs faster sql bits 2017 edition
Sql server 2016 it just runs faster   sql bits 2017 editionSql server 2016 it just runs faster   sql bits 2017 edition
Sql server 2016 it just runs faster sql bits 2017 edition
 
Unlock Bigdata Analytic Efficiency with Ceph Data Lake - Zhang Jian, Fu Yong
Unlock Bigdata Analytic Efficiency with Ceph Data Lake - Zhang Jian, Fu YongUnlock Bigdata Analytic Efficiency with Ceph Data Lake - Zhang Jian, Fu Yong
Unlock Bigdata Analytic Efficiency with Ceph Data Lake - Zhang Jian, Fu Yong
 
Hadoop World 2011: Hadoop Network and Compute Architecture Considerations - J...
Hadoop World 2011: Hadoop Network and Compute Architecture Considerations - J...Hadoop World 2011: Hadoop Network and Compute Architecture Considerations - J...
Hadoop World 2011: Hadoop Network and Compute Architecture Considerations - J...
 
SQL Server It Just Runs Faster
SQL Server It Just Runs FasterSQL Server It Just Runs Faster
SQL Server It Just Runs Faster
 
The state of SQL-on-Hadoop in the Cloud
The state of SQL-on-Hadoop in the CloudThe state of SQL-on-Hadoop in the Cloud
The state of SQL-on-Hadoop in the Cloud
 
The state of SQL-on-Hadoop in the Cloud
The state of SQL-on-Hadoop in the CloudThe state of SQL-on-Hadoop in the Cloud
The state of SQL-on-Hadoop in the Cloud
 
Ceph Performance Profiling and Reporting
Ceph Performance Profiling and ReportingCeph Performance Profiling and Reporting
Ceph Performance Profiling and Reporting
 
[ACNA2022] Hadoop Vectored IO_ your data just got faster!.pdf
[ACNA2022] Hadoop Vectored IO_ your data just got faster!.pdf[ACNA2022] Hadoop Vectored IO_ your data just got faster!.pdf
[ACNA2022] Hadoop Vectored IO_ your data just got faster!.pdf
 
20140120 presto meetup_en
20140120 presto meetup_en20140120 presto meetup_en
20140120 presto meetup_en
 
The state of Hive and Spark in the Cloud (July 2017)
The state of Hive and Spark in the Cloud (July 2017)The state of Hive and Spark in the Cloud (July 2017)
The state of Hive and Spark in the Cloud (July 2017)
 
Performance Optimizations in Apache Impala
Performance Optimizations in Apache ImpalaPerformance Optimizations in Apache Impala
Performance Optimizations in Apache Impala
 
Super Computer
Super ComputerSuper Computer
Super Computer
 
Using Derivation-Free Optimization in the Hadoop Cluster with Terasort
Using Derivation-Free Optimization in the Hadoop Cluster  with TerasortUsing Derivation-Free Optimization in the Hadoop Cluster  with Terasort
Using Derivation-Free Optimization in the Hadoop Cluster with Terasort
 
How Many Slaves (Ukoug)
How Many Slaves (Ukoug)How Many Slaves (Ukoug)
How Many Slaves (Ukoug)
 
PostgreSQL 10: What to Look For
PostgreSQL 10: What to Look ForPostgreSQL 10: What to Look For
PostgreSQL 10: What to Look For
 
Hadoop Hardware @Twitter: Size does matter!
Hadoop Hardware @Twitter: Size does matter!Hadoop Hardware @Twitter: Size does matter!
Hadoop Hardware @Twitter: Size does matter!
 
Deep Learning with Apache Spark and GPUs with Pierce Spitler
Deep Learning with Apache Spark and GPUs with Pierce SpitlerDeep Learning with Apache Spark and GPUs with Pierce Spitler
Deep Learning with Apache Spark and GPUs with Pierce Spitler
 
Apache Spark AI Use Case in Telco: Network Quality Analysis and Prediction wi...
Apache Spark AI Use Case in Telco: Network Quality Analysis and Prediction wi...Apache Spark AI Use Case in Telco: Network Quality Analysis and Prediction wi...
Apache Spark AI Use Case in Telco: Network Quality Analysis and Prediction wi...
 
Linux Kernel vs DPDK: HTTP Performance Showdown
Linux Kernel vs DPDK: HTTP Performance ShowdownLinux Kernel vs DPDK: HTTP Performance Showdown
Linux Kernel vs DPDK: HTTP Performance Showdown
 
WMS Performance Shootout 2009
WMS Performance Shootout 2009WMS Performance Shootout 2009
WMS Performance Shootout 2009
 

More from Yu Liu

Cloud Era Transactional Processing -- Problems, Strategies and Solutions
Cloud Era Transactional Processing -- Problems, Strategies and SolutionsCloud Era Transactional Processing -- Problems, Strategies and Solutions
Cloud Era Transactional Processing -- Problems, Strategies and Solutions
Yu Liu
 
Introduction to NTCIR 2016 MedNLPDoc
Introduction to NTCIR 2016 MedNLPDocIntroduction to NTCIR 2016 MedNLPDoc
Introduction to NTCIR 2016 MedNLPDoc
Yu Liu
 
高性能データ処理プラットフォーム (Talk on July Tech Festa 2015)
高性能データ処理プラットフォーム (Talk on July Tech Festa 2015)高性能データ処理プラットフォーム (Talk on July Tech Festa 2015)
高性能データ処理プラットフォーム (Talk on July Tech Festa 2015)
Yu Liu
 
Survey on Parallel/Distributed Search Engines
Survey on Parallel/Distributed Search EnginesSurvey on Parallel/Distributed Search Engines
Survey on Parallel/Distributed Search Engines
Yu Liu
 
Paper introduction to Combinatorial Optimization on Graphs of Bounded Treewidth
Paper introduction to Combinatorial Optimization on Graphs of Bounded TreewidthPaper introduction to Combinatorial Optimization on Graphs of Bounded Treewidth
Paper introduction to Combinatorial Optimization on Graphs of Bounded Treewidth
Yu Liu
 
Paper Introduction: Combinatorial Model and Bounds for Target Set Selection
Paper Introduction: Combinatorial Model and Bounds for Target Set SelectionPaper Introduction: Combinatorial Model and Bounds for Target Set Selection
Paper Introduction: Combinatorial Model and Bounds for Target Set Selection
Yu Liu
 
An accumulative computation framework on MapReduce ppl2013
An accumulative computation framework on MapReduce ppl2013An accumulative computation framework on MapReduce ppl2013
An accumulative computation framework on MapReduce ppl2013
Yu Liu
 
An Enhanced MapReduce Model (on BSP)
An Enhanced MapReduce Model (on BSP)An Enhanced MapReduce Model (on BSP)
An Enhanced MapReduce Model (on BSP)
Yu Liu
 
A Homomorphism-based Framework for Systematic Parallel Programming with MapRe...
A Homomorphism-based Framework for Systematic Parallel Programming with MapRe...A Homomorphism-based Framework for Systematic Parallel Programming with MapRe...
A Homomorphism-based Framework for Systematic Parallel Programming with MapRe...
Yu Liu
 
An Introduction of Recent Research on MapReduce (2011)
An Introduction of Recent Research on MapReduce (2011)An Introduction of Recent Research on MapReduce (2011)
An Introduction of Recent Research on MapReduce (2011)
Yu Liu
 
A Generate-Test-Aggregate Parallel Programming Library on Spark
A Generate-Test-Aggregate Parallel Programming Library on SparkA Generate-Test-Aggregate Parallel Programming Library on Spark
A Generate-Test-Aggregate Parallel Programming Library on Spark
Yu Liu
 
Introduction of A Lightweight Stage-Programming Framework
Introduction of A Lightweight Stage-Programming FrameworkIntroduction of A Lightweight Stage-Programming Framework
Introduction of A Lightweight Stage-Programming Framework
Yu Liu
 
Start From A MapReduce Graph Pattern-recognize Algorithm
Start From A MapReduce Graph Pattern-recognize AlgorithmStart From A MapReduce Graph Pattern-recognize Algorithm
Start From A MapReduce Graph Pattern-recognize Algorithm
Yu Liu
 
Introduction of the Design of A High-level Language over MapReduce -- The Pig...
Introduction of the Design of A High-level Language over MapReduce -- The Pig...Introduction of the Design of A High-level Language over MapReduce -- The Pig...
Introduction of the Design of A High-level Language over MapReduce -- The Pig...
Yu Liu
 
On Extending MapReduce - Survey and Experiments
On Extending MapReduce - Survey and ExperimentsOn Extending MapReduce - Survey and Experiments
On Extending MapReduce - Survey and Experiments
Yu Liu
 
Tree representation in map reduce world
Tree representation  in map reduce worldTree representation  in map reduce world
Tree representation in map reduce world
Yu Liu
 
Introduction to Ultra-succinct representation of ordered trees with applications
Introduction to Ultra-succinct representation of ordered trees with applicationsIntroduction to Ultra-succinct representation of ordered trees with applications
Introduction to Ultra-succinct representation of ordered trees with applications
Yu Liu
 
On Implementation of Neuron Network(Back-propagation)
On Implementation of Neuron Network(Back-propagation)On Implementation of Neuron Network(Back-propagation)
On Implementation of Neuron Network(Back-propagation)
Yu Liu
 
ScrewDriver Rebirth: Generate-Test-and-Aggregate Framework on Hadoop
ScrewDriver Rebirth: Generate-Test-and-Aggregate Framework on HadoopScrewDriver Rebirth: Generate-Test-and-Aggregate Framework on Hadoop
ScrewDriver Rebirth: Generate-Test-and-Aggregate Framework on Hadoop
Yu Liu
 
A Homomorphism-based MapReduce Framework for Systematic Parallel Programming
A Homomorphism-based MapReduce Framework for Systematic Parallel ProgrammingA Homomorphism-based MapReduce Framework for Systematic Parallel Programming
A Homomorphism-based MapReduce Framework for Systematic Parallel Programming
Yu Liu
 

More from Yu Liu (20)

Cloud Era Transactional Processing -- Problems, Strategies and Solutions
Cloud Era Transactional Processing -- Problems, Strategies and SolutionsCloud Era Transactional Processing -- Problems, Strategies and Solutions
Cloud Era Transactional Processing -- Problems, Strategies and Solutions
 
Introduction to NTCIR 2016 MedNLPDoc
Introduction to NTCIR 2016 MedNLPDocIntroduction to NTCIR 2016 MedNLPDoc
Introduction to NTCIR 2016 MedNLPDoc
 
高性能データ処理プラットフォーム (Talk on July Tech Festa 2015)
高性能データ処理プラットフォーム (Talk on July Tech Festa 2015)高性能データ処理プラットフォーム (Talk on July Tech Festa 2015)
高性能データ処理プラットフォーム (Talk on July Tech Festa 2015)
 
Survey on Parallel/Distributed Search Engines
Survey on Parallel/Distributed Search EnginesSurvey on Parallel/Distributed Search Engines
Survey on Parallel/Distributed Search Engines
 
Paper introduction to Combinatorial Optimization on Graphs of Bounded Treewidth
Paper introduction to Combinatorial Optimization on Graphs of Bounded TreewidthPaper introduction to Combinatorial Optimization on Graphs of Bounded Treewidth
Paper introduction to Combinatorial Optimization on Graphs of Bounded Treewidth
 
Paper Introduction: Combinatorial Model and Bounds for Target Set Selection
Paper Introduction: Combinatorial Model and Bounds for Target Set SelectionPaper Introduction: Combinatorial Model and Bounds for Target Set Selection
Paper Introduction: Combinatorial Model and Bounds for Target Set Selection
 
An accumulative computation framework on MapReduce ppl2013
An accumulative computation framework on MapReduce ppl2013An accumulative computation framework on MapReduce ppl2013
An accumulative computation framework on MapReduce ppl2013
 
An Enhanced MapReduce Model (on BSP)
An Enhanced MapReduce Model (on BSP)An Enhanced MapReduce Model (on BSP)
An Enhanced MapReduce Model (on BSP)
 
A Homomorphism-based Framework for Systematic Parallel Programming with MapRe...
A Homomorphism-based Framework for Systematic Parallel Programming with MapRe...A Homomorphism-based Framework for Systematic Parallel Programming with MapRe...
A Homomorphism-based Framework for Systematic Parallel Programming with MapRe...
 
An Introduction of Recent Research on MapReduce (2011)
An Introduction of Recent Research on MapReduce (2011)An Introduction of Recent Research on MapReduce (2011)
An Introduction of Recent Research on MapReduce (2011)
 
A Generate-Test-Aggregate Parallel Programming Library on Spark
A Generate-Test-Aggregate Parallel Programming Library on SparkA Generate-Test-Aggregate Parallel Programming Library on Spark
A Generate-Test-Aggregate Parallel Programming Library on Spark
 
Introduction of A Lightweight Stage-Programming Framework
Introduction of A Lightweight Stage-Programming FrameworkIntroduction of A Lightweight Stage-Programming Framework
Introduction of A Lightweight Stage-Programming Framework
 
Start From A MapReduce Graph Pattern-recognize Algorithm
Start From A MapReduce Graph Pattern-recognize AlgorithmStart From A MapReduce Graph Pattern-recognize Algorithm
Start From A MapReduce Graph Pattern-recognize Algorithm
 
Introduction of the Design of A High-level Language over MapReduce -- The Pig...
Introduction of the Design of A High-level Language over MapReduce -- The Pig...Introduction of the Design of A High-level Language over MapReduce -- The Pig...
Introduction of the Design of A High-level Language over MapReduce -- The Pig...
 
On Extending MapReduce - Survey and Experiments
On Extending MapReduce - Survey and ExperimentsOn Extending MapReduce - Survey and Experiments
On Extending MapReduce - Survey and Experiments
 
Tree representation in map reduce world
Tree representation  in map reduce worldTree representation  in map reduce world
Tree representation in map reduce world
 
Introduction to Ultra-succinct representation of ordered trees with applications
Introduction to Ultra-succinct representation of ordered trees with applicationsIntroduction to Ultra-succinct representation of ordered trees with applications
Introduction to Ultra-succinct representation of ordered trees with applications
 
On Implementation of Neuron Network(Back-propagation)
On Implementation of Neuron Network(Back-propagation)On Implementation of Neuron Network(Back-propagation)
On Implementation of Neuron Network(Back-propagation)
 
ScrewDriver Rebirth: Generate-Test-and-Aggregate Framework on Hadoop
ScrewDriver Rebirth: Generate-Test-and-Aggregate Framework on HadoopScrewDriver Rebirth: Generate-Test-and-Aggregate Framework on Hadoop
ScrewDriver Rebirth: Generate-Test-and-Aggregate Framework on Hadoop
 
A Homomorphism-based MapReduce Framework for Systematic Parallel Programming
A Homomorphism-based MapReduce Framework for Systematic Parallel ProgrammingA Homomorphism-based MapReduce Framework for Systematic Parallel Programming
A Homomorphism-based MapReduce Framework for Systematic Parallel Programming
 

Recently uploaded

Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
AndrzejJarynowski
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
Sachin Paul
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
zsjl4mimo
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
aqzctr7x
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Aggregage
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
74nqk8xf
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
roli9797
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
Sm321
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
Walaa Eldin Moustafa
 
State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023
kuntobimo2016
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
Social Samosa
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
mzpolocfi
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
soxrziqu
 
Natural Language Processing (NLP), RAG and its applications .pptx
Natural Language Processing (NLP), RAG and its applications .pptxNatural Language Processing (NLP), RAG and its applications .pptx
Natural Language Processing (NLP), RAG and its applications .pptx
fkyes25
 

Recently uploaded (20)

Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
 
State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
 
Natural Language Processing (NLP), RAG and its applications .pptx
Natural Language Processing (NLP), RAG and its applications .pptxNatural Language Processing (NLP), RAG and its applications .pptx
Natural Language Processing (NLP), RAG and its applications .pptx
 

A TPC Benchmark of Hive LLAP and Comparison with Presto

  • 1. A Benchmark & Comparison on Hive (2.1.0) LLAP and Presto (0.208) Data Engineering@SmartNews Yu Liu 2018/10/01
  • 2. Background and Preliminaries TPC-DS and TPC-H benchmark suits Hive LLAP (link) Similar results from Hortonworks and Microsoft
  • 3. Conclusions for Impatience ❏ Hive LLAP brings significant improvements on performance ❏ Hive LLAP is outperformed on TPC-DS, compared with non-LLAP and Presto ❏ Presto shows some advantages on TPC-H ❏ Hive LLAP causes bigger footprints of RAM usage and need careful tuning
  • 4. Environment Setting Cluster configurations (AWS EC2): ❏ 1 X Master : r4.xlarge (4 vCPU - 32GB RAM) ❏ 10 X Workers : i3.large (2 vCPU - 16GB RAM) Stacks: ❏ HDP 2.6 (Hive 2.1.0, Tez 0.7.0, Calcite 1.2.0) ❏ Presto 0.208 TPC Data Sets ❏ 10 GB Text/ORC format for both DS and H
  • 5. Results and Comparisons The way of calculating query performance in this table including “query-duration” and “job-submission” time.
  • 6. Dive into Details In total 99 queries Hive LLAP Presto Comparison (H v P) Faster cases (num) 53 17 3.1 times Total Runtime (s) 1351 2058 65.6% (1.5 times faster) Failed cases (num) 0 21 (OOM or syntax error) Due to the time constraint, the benchmark used a same set of SQLs for Hive 2
  • 7. Biggest Differences Query No. Presto (sec.) Hive LLAP (sec.) Difference ratio query39 188 29 6.48 query47 56 13 4.31 query9 28 7 4 query44 8 30 3.75 query22 48 161 3.35 query88 50 15 3.33
  • 8. Compare with Current EMR(5) Installation Conditions: ➔ Input Data: same (24 tables,10 GB text and converted to ORC format) ➔ TPC-DS queries on ORC tables ➔ EMR 5 instances spec. are much better (worker nodes = 16 vCPU, 32GB RAM x 10) Results: tens of times faster than EMR5 (details in next slide) Reasons: 1. LLAP setting 2. ORC file format issue 3. Other Hive configurations
  • 9. TPC-DS Queries Duration Time Query No. Run on EMR(5) in sec. Run on Hive LLAP in sec. Difference (times) No. 1 130.717 1.855 70 No. 11 184.94 7.434 25 No. 12 31.565 1.384 23 No. 50 90.435 11.576 8 No. 66 140.1 3.742 38 The way of calculating query performance in this table only including “query-duration” time.
  • 10. Many other details are omitted The following details are omitted because of space, in general and obviously they are slower ❏ Hive with LLAP on MR engine ❏ Hive with Tez without LLAP
  • 11. Future work ❏ Hive LLAP with Druid as the storage engine ❏ Compare with EMR 5.0 installation (with some constraint on supporting ORC) ❏ Parquet vs ORC ❏ ORC with different compaction strategies (BI/ETL/Hybrid) ❏ Evaluation on creations of bloom filter and zone map