Hive, Presto, and Spark on TPC-DS benchmark

Hive, Presto, and Spark
on
TPC-DS benchmark
Dongwon Kim, PhD
SK Telecom

Contents
• Experimental setup
• Experimental results

[Experimental setup] TPC-DS dataset and query
• Hive
• Entirely depend on github.com/hortonworks/hive-testbench
• Distributed data generator
• A small dataset (100GB)
• A large dataset (1TB)
• DDLs
• External table declaration
• Partitioned table declaration (ORC)
• 66 queries provided (out of 99 TPC-DS queries)
• Presto
• Use hive-hadoop2 connector to read the same partitioned table
• Use the same query
• Spark
• Connected to Hive MetaStore to read the same partitioned table
• Use the same query

[Experimental setup] Cluster setup
• A single master node with 5 slave nodes
• Two 12-core processes with total 48 hyper threads
• 128GB main memory
• 10 HDDs
• Hadoop 2.7.3
• Hive 2.1.1 + Tez 0.8.4
• A LLAP worker on a node uses 192 cores and 80GB
• Presto 0.162
• A Presto worker on a node uses 192 cores and 80GB
• distributed-joins-enabled = false
• Spark 2.0.2
• 4 Spark executors on a node uses 192 cores and 80GB

[Experimental setup] Performance monitoring tool
• github.com/eastcirclek/swimlane-graphs
• Hive/Presto/Spark task swimlane graph + Ganglia resource utilization graph
• To observe the main cause of performance bottleneck

[Experimental results] Characteristics of each engine
• Hive
• Improve significantly through LLAP
• Good for both small and large workload
• Especially good for IO-bound workloads
• Spark
• Improve CPU performance through Whole Stage Code Generation
• Especially good for CPU-bound workloads
• Does not outperform Hive and Presto for IO-bound workloads
• Presto
• Pipelined execution to reduce unnecessary disk IOs
• Good for simple queries
• Works okay only when data is fit into memory

[Experimental results] Query execution time (100GB)
with query72 without query72
Pairwise comparison
reduction in
sum of running times
Pairwise comparison
reduction in
Spark > Hive 26.3 %
(1668s  1229s)
Hive > Spark 19.8 %
(1143s  916s)
Hive > Presto 55.6 %
(2797s  1241s)
(982s  489s)
Spark > Presto 62.0 %
(2932s  1114s)
Spark > Presto 5.2%
(1116s  1057s)
Spark > Hive >>> Presto Hive > Spark >= Presto
Reversed
Gap reduced
significantly
Hive with LLAP is good even for small workload
* When comparing each pair of engines, I count queries that are completed by both of two engines.

[Experimental results] Query execution time (1TB)
with query72 without query72
Pairwise comparison
reduction in
Pairwise comparison
reduction in
Hive > Spark 28.2 %
(6445s  4625s)
Hive > Spark 41.3 %
(6165s  3629s)
(5567s  2426s)
(1460s  1087s)
Spark > Presto 29.2 %
(5685s  4026s)
Presto > Spark 58.6%
(3812s  1578s)
Hive > Spark >>> Presto Hive > Presto > Spark
Reversed
* When comparing each pair of engines, I count queries that are completed by both of two engines.
Hive with LLAP is good for large, I/O-bound workload
Presto works okay if data fit into memory

[Experimental results] Curse of Query72 on Hive and Presto
0
500
1000
1500
2000
72 89 43 63 19 3 51 52 42 55 82 29 17 25 49 39 91 40 21 13 12 73 96 48 20 85 34 84 79 32 7 26 45 27 46 88 68 15 97 92 93 66 87 24 28 56 71 83 60 76 31 64 74
Presto Spark
100GB dataset
0
500
1000
1500
2000
72 89 43 63 19 3 42 52 55 25 49 29 17 75 82 40 88 31 13 71 91 56 85 60 68 28 46 48 26 7 51 66 21 34 27 20 45 12 87 73 79 15 32 96 84 76 39 93 92 97
Presto Hive
0
500
1000
1500
2000
72 22 67 51 97 92 95 82 39 93 21 94 96 84 73 12 18 79 32 3 91 20 43 98 52 42 89 63 45 15 55 34 48 27 7 26 90 46 68 13 19 85 87 40 66 76 28 88 50 60 29 17 56 58 71 54 25 49 31 80
Spark Hive

0
1000
2000
3000
4000
72 75 91 49 13 26 88 71 40 66 56 60 68 7 31 27 48 79 87 46 45 73 15 34 84 20 21 12 39 32 96 76 28 51 92 85 93
Presto Hive
0
1000
2000
3000
4000
72 67 82 92 97 95 22 51 21 12 20 15 26 19 96 39 27 84 3 13 18 28 43 45 52 7 34 48 42 46 55 32 73 89 87 63 68 90 79 91 76 66 40 71 93 50 94 56 85 54 60 58 25 17 31 29 88 49 74 80
Spark Hive
0
1000
2000
3000
4000
72 26 13 91 21 15 20 12 27 7 84 39 45 96 48 46 34 40 71 68 66 73 87 92 97 79 32 51 28 76 85 56 93 83 60 24 49 88 31 74
Presto Spark
[Experimental results] Curse of Query72 on Hive and Presto 1TB dataset

[Experimental results] Curse of Query72 on Presto and Hive
Presto SparkHive
High CPU utilization
for a long time
(looks like “plateau”)
No plateau observed
thanks to
WholeStageCodeGen
plateau

w/ WholeStageCodeGen w/o WholeStageCodeGen
CPU
Network
Disk
[Experimental results] Whole Stage Code Generation
Presto Hive
Without WholeStageCodeGeneration,
“plateau” is observed like Presto and Hive
 10th stage takes much longer
without WholeStageCodeGeneration
plateau

[Experimental results] Whole Stage Code Generation
1
10
100
1000
75 71 64 83 73 40 56 55 43 85 89 63 84 27 92 82 28 93 91 12 96 32 7 66 20 45 98 42 68 52 87 90 94 79 3 48 26 60 46 34 67 51 15 29 65 76 50 31 49 80 19 18 21 58 17 13 97 22 95 54 25 39 88 24 72
w/ WSCG w/o WSCG
100GB dataset

[Experimental results] Performance of Hive w/ and w/o LLAP
1
10
100
1000
10000
67 51 70 97 92 68 46 42 71 73 90 34 39 48 84 52 43 63 98 13 3 7 79 66 20 26 27 96 21 45 89 80 12 56 15 32 40 18 49 19 55 60 54 87 31 76 75 82 22 88 28 58 94 93 95 72
llap container
LLAP shows improvement over container-based Tez for most queries
100GB
dataset
1TB
dataset
1
10
100
1000
10000
80 84 96 79 94 51 91 31 56 67 27 32 95 46 72 39 19 48 90 52 21 45 15 3 34 55 42 20 12 97 43 60 63 22 26 76 75 83 88 68 70 89 65 28 13 87 40 58 18 85 25 17 29 93 50 92 66 73 71
llap container
All queries : 78.3% reduction
All except 72 : 36.1% reduction
All queries : 44.9% reduction
All except 72 : 27.9% reduction

[Experimental results][Query 75] without and with LLAP
without LLAP without LLAP
with LLAP
with LLAP

without LLAP without LLAP with LLAP
with LLAP

[Experimental results][Query 93] Difference pattern of resource utilization
Network  CPU
Presto Hive
CPU  Network
Spark

Hive, Presto, and Spark on TPC-DS benchmark

More Related Content

What's hot

Viewers also liked

Similar to Hive, Presto, and Spark on TPC-DS benchmark

Recently uploaded

Hive, Presto, and Spark on TPC-DS benchmark