Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Hive, Presto, and Spark on TPC-DS benchmark

6,914 views

Published on

I've run a few SQL-on-Hadoop systems, Hive (LLAP), Presto, and Spark (Whole Stage Code Generation), using TPC-DS benchmark.

Published in: Technology

Hive, Presto, and Spark on TPC-DS benchmark

  1. 1. Hive, Presto, and Spark on TPC-DS benchmark Dongwon Kim, PhD SK Telecom
  2. 2. Contents • Experimental setup • Experimental results
  3. 3. [Experimental setup] TPC-DS dataset and query • Hive • Entirely depend on github.com/hortonworks/hive-testbench • Distributed data generator • A small dataset (100GB) • A large dataset (1TB) • DDLs • External table declaration • Partitioned table declaration (ORC) • 66 queries provided (out of 99 TPC-DS queries) • Presto • Use hive-hadoop2 connector to read the same partitioned table • Use the same query • Spark • Connected to Hive MetaStore to read the same partitioned table • Use the same query
  4. 4. [Experimental setup] Cluster setup • A single master node with 5 slave nodes • Two 12-core processes with total 48 hyper threads • 128GB main memory • 10 HDDs • Hadoop 2.7.3 • Hive 2.1.1 + Tez 0.8.4 • A LLAP worker on a node uses 192 cores and 80GB • Presto 0.162 • A Presto worker on a node uses 192 cores and 80GB • distributed-joins-enabled = false • Spark 2.0.2 • 4 Spark executors on a node uses 192 cores and 80GB
  5. 5. [Experimental setup] Performance monitoring tool • github.com/eastcirclek/swimlane-graphs • Hive/Presto/Spark task swimlane graph + Ganglia resource utilization graph • To observe the main cause of performance bottleneck
  6. 6. [Experimental results] Characteristics of each engine • Hive • Improve significantly through LLAP • Good for both small and large workload • Especially good for IO-bound workloads • Spark • Improve CPU performance through Whole Stage Code Generation • Especially good for CPU-bound workloads • Does not outperform Hive and Presto for IO-bound workloads • Presto • Pipelined execution to reduce unnecessary disk IOs • Good for simple queries • Works okay only when data is fit into memory
  7. 7. [Experimental results] Query execution time (100GB) with query72 without query72 Pairwise comparison reduction in sum of running times Pairwise comparison reduction in sum of running times Spark > Hive 26.3 % (1668s  1229s) Hive > Spark 19.8 % (1143s  916s) Hive > Presto 55.6 % (2797s  1241s) Hive > Presto 50.2 % (982s  489s) Spark > Presto 62.0 % (2932s  1114s) Spark > Presto 5.2% (1116s  1057s) Spark > Hive >>> Presto Hive > Spark >= Presto Reversed Gap reduced significantly Hive with LLAP is good even for small workload * When comparing each pair of engines, I count queries that are completed by both of two engines.
  8. 8. [Experimental results] Query execution time (1TB) with query72 without query72 Pairwise comparison reduction in sum of running times Pairwise comparison reduction in sum of running times Hive > Spark 28.2 % (6445s  4625s) Hive > Spark 41.3 % (6165s  3629s) Hive > Presto 56.4 % (5567s  2426s) Hive > Presto 25.5 % (1460s  1087s) Spark > Presto 29.2 % (5685s  4026s) Presto > Spark 58.6% (3812s  1578s) Hive > Spark >>> Presto Hive > Presto > Spark Reversed * When comparing each pair of engines, I count queries that are completed by both of two engines. Hive with LLAP is good for large, I/O-bound workload Presto works okay if data fit into memory
  9. 9. [Experimental results] Curse of Query72 on Hive and Presto 0 500 1000 1500 2000 72 89 43 63 19 3 51 52 42 55 82 29 17 25 49 39 91 40 21 13 12 73 96 48 20 85 34 84 79 32 7 26 45 27 46 88 68 15 97 92 93 66 87 24 28 56 71 83 60 76 31 64 74 Presto Spark 100GB dataset 0 500 1000 1500 2000 72 89 43 63 19 3 42 52 55 25 49 29 17 75 82 40 88 31 13 71 91 56 85 60 68 28 46 48 26 7 51 66 21 34 27 20 45 12 87 73 79 15 32 96 84 76 39 93 92 97 Presto Hive 0 500 1000 1500 2000 72 22 67 51 97 92 95 82 39 93 21 94 96 84 73 12 18 79 32 3 91 20 43 98 52 42 89 63 45 15 55 34 48 27 7 26 90 46 68 13 19 85 87 40 66 76 28 88 50 60 29 17 56 58 71 54 25 49 31 80 Spark Hive
  10. 10. 0 1000 2000 3000 4000 72 75 91 49 13 26 88 71 40 66 56 60 68 7 31 27 48 79 87 46 45 73 15 34 84 20 21 12 39 32 96 76 28 51 92 85 93 Presto Hive 0 1000 2000 3000 4000 72 67 82 92 97 95 22 51 21 12 20 15 26 19 96 39 27 84 3 13 18 28 43 45 52 7 34 48 42 46 55 32 73 89 87 63 68 90 79 91 76 66 40 71 93 50 94 56 85 54 60 58 25 17 31 29 88 49 74 80 Spark Hive 0 1000 2000 3000 4000 72 26 13 91 21 15 20 12 27 7 84 39 45 96 48 46 34 40 71 68 66 73 87 92 97 79 32 51 28 76 85 56 93 83 60 24 49 88 31 74 Presto Spark [Experimental results] Curse of Query72 on Hive and Presto 1TB dataset
  11. 11. [Experimental results] Curse of Query72 on Presto and Hive Presto SparkHive High CPU utilization for a long time (looks like “plateau”) No plateau observed thanks to WholeStageCodeGen plateau
  12. 12. w/ WholeStageCodeGen w/o WholeStageCodeGen CPU Network Disk [Experimental results] Whole Stage Code Generation Presto Hive Without WholeStageCodeGeneration, “plateau” is observed like Presto and Hive  10th stage takes much longer without WholeStageCodeGeneration plateau
  13. 13. [Experimental results] Whole Stage Code Generation 1 10 100 1000 75 71 64 83 73 40 56 55 43 85 89 63 84 27 92 82 28 93 91 12 96 32 7 66 20 45 98 42 68 52 87 90 94 79 3 48 26 60 46 34 67 51 15 29 65 76 50 31 49 80 19 18 21 58 17 13 97 22 95 54 25 39 88 24 72 w/ WSCG w/o WSCG 100GB dataset
  14. 14. [Experimental results] Performance of Hive w/ and w/o LLAP 1 10 100 1000 10000 67 51 70 97 92 68 46 42 71 73 90 34 39 48 84 52 43 63 98 13 3 7 79 66 20 26 27 96 21 45 89 80 12 56 15 32 40 18 49 19 55 60 54 87 31 76 75 82 22 88 28 58 94 93 95 72 llap container LLAP shows improvement over container-based Tez for most queries 100GB dataset 1TB dataset 1 10 100 1000 10000 80 84 96 79 94 51 91 31 56 67 27 32 95 46 72 39 19 48 90 52 21 45 15 3 34 55 42 20 12 97 43 60 63 22 26 76 75 83 88 68 70 89 65 28 13 87 40 58 18 85 25 17 29 93 50 92 66 73 71 llap container All queries : 78.3% reduction All except 72 : 36.1% reduction All queries : 44.9% reduction All except 72 : 27.9% reduction
  15. 15. [Experimental results][Query 75] without and with LLAP without LLAP without LLAP with LLAP with LLAP
  16. 16. [Experimental results][Query 93] without and with LLAP without LLAP without LLAP with LLAP with LLAP
  17. 17. [Experimental results][Query 94] without and with LLAP without LLAP without LLAP with LLAP with LLAP
  18. 18. [Experimental results][Query 93] Difference pattern of resource utilization Network  CPU Presto Hive CPU  Network Spark
  19. 19. The end

×