Hive on spark is blazing fast or is it final

© Hortonworks Inc. 2011 – 2015. All Rights Reserved
Hive on Spark is Blazing Fast… Or Is It?
Carter Shanklin and Mostafa Mokhtar

Why SQL on Hadoop? Solving for Scale.
Hadoop is great for
cost, but MapReduce is
too difficult.
SQL on Hadoop makes
Hadoop real and gives
me scale that traditional
SQL can’t offer.
I’m deleting important
data because it’s too
expensive to store it.
$

SQL at Facebook: Emergence of Apache Hive
Developed Hive to address traditional RDBMS limitations.
300+ PB of data under management(1).
600+ TB of data loaded daily.
60,000+ Hive queries per day(2).
More than 1,000 users per day.
Initial Apache release in April 2009.

Hive Classic: Strengths and Challenges
Familiar SQL Interface+
Economical Processing of Petabytes+
Hive Classic tied to MapReduce, leading to latency
Traditional SQL Workloads Needed Higher Performance!

Need for Speed: The Stinger Initiative
Stinger: An Open Roadmap to improve Apache Hive’s performance 100x.
Launched: February 2013; Delivered: April 2014.
Delivered in 100% Apache Open Source.
SQL Engine
Vectorized
SQL Engine
Columnar
Storage
ORCFile
= 100X+ +
Distributed
Execution
Apache Tez

Stinger Phase 3: TPC-DS Benchmark at 30 Terabyte Scale
Sample of 50 queries from TPC-DS at 30 terabyte scale.
Average 52x Query Speedup, Maximum 160x Query Speedup.
Total benchmark time decreased from 7.8 days to 9.3 hours.(3)
Cost-Based Optimizer added in Hive 14 gave additional 2.5x Speedup.

Hive + Stinger at Yahoo
Around 1 million Hive jobs
run every month.
Scalei
Total benchmark time from
8.1 hours to 1.3 hours at
10TB scale.
Performancei
Up to 82x faster.(4)
Performancei

Stinger at Spotify
Query 25 TB of compressed
data in 10 Minutes across
690 nodes (MapReduce too
slow to complete.)
Speedi
16x less HDFS read when
using ORCFile versus Avro.(5)
Efficiencyi

ORCFile at Facebook
Saved more than 1,400
servers worth of storage.
Compressioni
Compression ratio
increased from 5x to 8x
globally.
Compressioni

Hive on Tez: Conclusion
Hive on Tez delivers fast batch and interactive SQL today.
But users need more speed!
Proven at petabyte scale.
Scalei
The most comprehensive
open-source SQL on
Hadoop.
SQLi
More than 90 Hortonworks
customers use Hive-on-Tez
today for fast SQL.
Speedi
Hortonworks Customer Support metrics as of Feb/2015

Next Stop: Stinger.next and Sub-Second SQL
Emergence of LLAP and Hive-on-Spark bring Sub-Second within reach.
What does it take to get Hive to sub-second?
Does Hive-on-Spark get us there?

Performance Today and the Sub-Second
Future
Hive on Tez, Hive on Spark, Hive on Mapreduce & Spark-SQL

© Hortonworks Inc. 2014
Query processing in Hadoop
Cache
Block
Cache
Linux Cache
Storage
Columnar Storage
Parquet File
Distrided
ExecutionEngine
SQL Engine
Hive Engine
SQL SQL support
HiveQL
Tez
Columnar Storage
ORC File
MapReduce Spark
Spark-SQL
SQL Engine

Query processing in Hadoop
Cache
Block
Cache
Linux Cache
Storage
Columnar Storage
Parquet File
Distrided
ExecutionEngine
SQL Engine
Hive Engine
SQL SQL support
HiveQL
Tez
Columnar Storage
ORC File
MapReduce Spark
Spark-SQL
SQL Engine
What is covered today
in terms of performance

Performance comparison : Test bed
Component Version
Hive 1.2.0
Tez 0.5.2
Spark 1.2.0
Hadoop 2.6.0
Software :
Hardware
20 physical nodes, each with:
● 2x Intel(R) Xeon(R) CPU E5-2640 v2 @ 2.00GHz for total of 16 CPU cores/machine
● Hyper-threading enabled
● 256GB RAM per node
● 6x 4TB WDC WD4000FYYZ-0 drives per node
● 10 Gigabit interconnect between the nodes
Note: Based on the YARN Node Manager’s Memory Resource setting used below, only 128 GB of RAM per node
was dedicated to query processing.
Execution
Engine
Primitives on 30TB
Scale factor
TPC-DS queries on
30TB Scale factor
TPC-DS queries on 200GB
Scale factor
Spark X X X
Tez X X X
Map Reduce X
Spark-SQL X X X
Performance benchmarks :

Performance comparison : Configurations
Hive on Tez
● 128GB of memory allocated
● 16 out of 32 Logical processors
allocated
● hive.execution.engine = tez
● hive.auto.convert.join.noconditionaltask.
size = 600MB
● Vectorization enabled
● CBO enabled
● Fetch column stats enabled
Other settings
● hive.prewarm.numcontainers = 317
● hive.tez.auto.reducer.parallelism = true
Hive on Spark
allocated
● hive.execution.engine=spark
● Configuration parameters followed
recomendation from Hive on Spark wiki
http://tinyurl.com/pk2ju8e which
also had CBO, Vectoriztion, fetch
column stats enabled etc..
● spark.master=yarn-master
Spark settings
● spark.shuffle.memoryFraction = 0.5
● spark.storage.memoryFraction = 0.1
● spark.shuffle.consolidateFiles = true
● spark.serializer =
org.apache.spark.serializer.KryoSerializer
Spark-SQL
allocated
● spark.shuffle.memoryFraction = 0.5
● spark.storage.memoryFraction = 0.1
● spark.shuffle.consolidateFiles = true
● spark.serializer =
org.apache.spark.serializer.KryoSerializer
● spark.sql.shuffle.partitions = 1009
● spark-sql --master yarn-client
● driver-memory 8g
● Default GC configuration
spark.sql.codegen was not enabled as it caused
most queries to fail.

Performance comparison : TPC-DS 200GB
● Warm timings reported, Cold queries on Spark are significantly slower
● Hive on Tez using ORC format
● Hive on Spark using Parquet format
● Spark-sql using Parquet format
1,118
1,982
1,235

Performance comparison : TPC-DS 200GB continued..
● Warm timings reported, Cold queries on Spark are significantly slower
● Hive on Tez using ORC format
● Hive on Spark using Parquet format
● Spark-sql using Parquet format
1,118
1,982
1,235
Hive on Tez is
77% faster than Hive on Spark
10% faster than Spark-sql
Spark-sql is
60% faster than Hive on Spark

Performance comparison : TPC-DS 200GB summary

Performance comparison : TPC-DS 200GB summary
Even simple
queries don’t
run in sub-
second

Performance comparison : TPC-DS 200GB
● 200GB Scale factor, un-partitioned schema
● 45x unmodified queries from TPC-DS
● ORC format compression ratio 3.4x
● Parquet format compression ratio of 2.8x

Performance comparison : TPC-DS 30TB
● 30 TB Scale factor
● ORC Table format
● Fact tables partitioned on *_date_sk
● Explicit partition filters where used for Hive on Spark and Spark-SQL (but not for Hive-on-Tez)
● 20 out of the previously used queries where used, warm query timings reported
● Hive on Tez outperforms Hive on Spark and Spark-SQL by up to 18x
● Hive on Spark completed 15 out of the 20, the remaining 5 queries errored out or where stuck in GC and got cancelled
● Spark-SQL completed 7 out of the 20, the remaining 13 queries either failed within a couple of minutes or errored out after running
for hours
● Spark-SQL performance is negatively affected by in-efficient query plans as it lacks a query optimizer
Workload config
Highlights from 30TB TPC-DS test

1,828
10,098

1,828
10,098For large data set
Hive on Tez is ~5x
faster than Hive on
Spark

Performance comparison : TPC-DS 30TB continued

Performance comparison : TPC-DS 30TB continued
Failed Spark-SQL
queries

Performance comparison : TPC-DS 30TB Q17

Performance comparison : TPC-DS 30TB Q17
Hive on Tez
query ends
here

Why didn’t Spark take Hive to sub-second?
● Hive is CPU bound for most operations specially after the introduction of columnar file formats (do more with less)
● Spark consumes more CPU, Disk & Network IO than Tez
● Hive on Spark spends a lot of time translating from RDDs to Hive’s “Row Containers”

Why didn’t Spark take Hive to sub-second?
● Hive is CPU bound for most operations specially after the introduction of columnar file formats (do more with less)
● Spark consumes more CPU, Disk & Network IO than Tez for relatively large datasets
● Hive on Spark spends a lot of time translating from RDDs to Hive’s “Row Containers”
2x less
Disk IO
4x less
Network IO6x less
CPU

I don’t believe what you just said!!!
Show me some queries I can understand...
Simple queries to understand complex systems
Execution engine Primitives

Performance comparison : What are those primitives?
Group Test case Comment
ETL
Create table as select * Insert 8 Billion rows, 570 GB of Data
Create table as select with Group by Group by and Insert 8 Billion rows, 570 GB of Data
Create table as with Group by on all columns followed
by cluster by
Group by, cluster by and Insert 8 Billion rows, 570 GB of Data
Group by
Group by on primary key Group by 25 billion distinct keys
Group by on column with low NDV* Group by 82 billion rows with 8K distinct keys
Map join
store_sales x item Map join 28 Billion x 462K
store_sales x item x store Map join 28 Billion x 462K x 1.7K
store_sales x item x store x customer_demographics Map join 28 Billion x 462K x 1.7K x 1.9 Million
Shuffle Join
Shuffle join Shuffle join 8.6 Billion x 706 Million rows
Shuffle join + Group by on primary key Shuffle join 8.6 Billion x 706 Million rows followed by group by on
675 Million rows
NDV* Number of distinct values

Performance comparison : CTAS
Create table test_table as select * from store_returns;
Execution engine Elapsed time (Seconds) Tez Gain %
Hive on Tez 316
Hive on Spark 351 11%
Hive on Mapreduce 494 56%
Spark-SQL 418 32%
Table Scan
store_returns
8 Billion rows
Table Insert
8 Billion rows
316
351
494
418

Performance comparison : CTAS
Create table test_table as select * from store_returns;
Hive on Tez 316
Spark-SQL 418 32%
Table Scan
store_returns
8 Billion rows
Table Insert
8 Billion rows
316
351
494
418
Tez is
11% faster than Spark
56% faster than Mapreduce
32% faster than Spark-SQL

Performance comparison : CTAS with group by
Create table test_table as select * from store_returns group by *;
Hive on Tez 630
Hive on Spark 1,608 155%
Spark-SQL 1,202 91%
Table Insert
4 Billion rows
Shuffle
On all columns
8 Billion rows
Group by
On all columns
7 billion rows
Table Scan
store_returns
8 Billion rows
630
1,608
840
1,202

Hive on Tez 630
Spark-SQL 1,202 91%
Table Insert
4 Billion rows
Shuffle
On all columns
8 Billion rows
Group by
On all columns
7 billion rows
Table Scan
store_returns
8 Billion rows
630
1,608
840
1,202
This time, execution engine
must prepare, shuffle and
aggregate data.

Hive on Tez 630
Spark-SQL 1,202 91%
Table Insert
4 Billion rows
Shuffle
On all columns
8 Billion rows
Group by
On all columns
7 billion rows
Table Scan
store_returns
8 Billion rows
630
1,608
840
1,202
Tez is

Performance comparison : Select + group by on PK
select count(*) rowcount from store_sales group by ss_item_sk , ss_ticket_number having rowcount > 100000000
Hive on Tez 457
Spark-SQL 862 89%
Select
0 rows qualify
Shuffle
25 Billion rows
Group by
25 billion rows
Table Scan
25 Billion rows
Filter operator
25 billion rows
457
2,966
893 862

Hive on Tez 457
Spark-SQL 862 89%
Select
0 rows qualify
Shuffle
25 Billion rows
Group by
25 billion rows
Table Scan
25 Billion rows
Filter operator
25 billion rows
457
2,966
893 862
Group-By performed on all
25 billion distinct keys.

Hive on Tez 457
Spark-SQL 862 89%
Select
0 rows qualify
Shuffle
25 Billion rows
Group by
25 billion rows
Table Scan
25 Billion rows
Filter operator
25 billion rows
457
2,966
893 862
Tez is

Performance comparison : Select + group by on low NDV
select sum(ss_list_price) from store_sales group by ss_sold_date_sk having sum(ss_list_price) = 1
Hive on Tez 51
Spark-SQL 164 221%
Select
0 rows qualify
Group by
85 billion rows
Table Scan
85 Billion rows
Filter operator
8K rows
51
290
56
164

Performance comparison : Select + group by on low NDV
select sum(ss_list_price) from store_sales group by ss_sold_date_sk having sum(ss_list_price) = 1
Hive on Tez 51
Spark-SQL 164 221%
Select
0 rows qualify
Group by
85 billion rows
Table Scan
85 Billion rows
Filter operator
8K rows
51
290
56
164
Hive on Tez and
Hive on Spark
outperform
Spark-SQL

select count(*) from store_sales, item, store, customer_demographics where i_item_sk = ss_item_sk and s_store_sk = ss_store_sk and ss_cdemo_sk = cd_demo_sk
Performance comparison : Map join with 1,2 & 3 tables
Map join
27 Billion
rows
Map join
27 Billion rows
Map join
27 Billion rows
Table Scan
store_sales
28 Billion rows
Table Scan
customer_demographic
s
1.9 Million rows
Table Scan
item
472K rows
Table Scan
Store
1.7K rows
Execution engine Map join #1 Map join #2 Map join #3 Tez Join #1 Gain % Tez Join #2 Gain % Tez join #3 Gain %
Hive on Tez 108 145 232
Hive on Spark 106 142 289 98% 98% 125%
Hive on Mapreduce 247 280 800 228% 193% 345%
Spark-SQL 86 117 166 -20% -20% -28%

select count(*) from store_sales, item, store, customer_demographics where i_item_sk = ss_item_sk and s_store_sk = ss_store_sk and ss_cdemo_sk = cd_demo_sk
Performance comparison : Map join with 1,2 & 3 tables
Map join
27 Billion
rows
Map join
27 Billion rows
Map join
27 Billion rows
Table Scan
store_sales
28 Billion rows
Table Scan
customer_demographic
s
1.9 Million rows
Table Scan
item
472K rows
Table Scan
Store
1.7K rows
Execution engine Map join #1 Map join #2 Map join #3 Tez Join #1 Gain % Tez Join #2 Gain % Tez join #3 Gain %
Hive on Tez 108 145 232
Hive on Spark 106 142 289 98% 98% 125%
Hive on Mapreduce 247 280 800 228% 193% 345%
Spark-SQL 86 117 166 -20% -20% -28%
Spark-SQL is faster than
Hive on Tez and Hive on
Spark for Map-joins

Performance comparison : Shuffle join + group by
● select count(*) from store_sales a ,store_returns b where a.ss_item_sk = b.sr_item_sk and a.ss_ticket_number = b.sr_ticket_number
● select count(*) from store_sales a ,store_returns b where a.ss_item_sk = b.sr_item_sk and a.ss_ticket_number = b.sr_ticket_number group by
ss_item_sk , ss_ticket_number having rowcount > 1
Execution engine Shuffle join Shuffle join + group by Tez Shuffle Gain % Tez Gain %
Hive on Tez 400 453
Hive on Spark 1,078 1,120 170% 147%
Hive on Mapreduce 756 826 89% 82%
Spark-SQL 1,835 1,884 359% 316%
Shuffle Join
9 Billion rows
Group by
675 Million
rows
Table
Scan
8.6
Billion
rows
Table
Scan
6 Million
rows
Select
0 rows
Filter
675 Million
rows
400
1,078 1,120
826
453
756
1,884
1,835

Shuffle Join
9 Billion rows
Group by
675 Million
rows
Table
Scan
8.6
Billion
rows
Table
Scan
6 Million
rows
Select
0 rows
Filter
675 Million
rows
400
1,078 1,120
826
453
756
1,884
1,835
Tez is
Tez is
Hive on Tez 400 453
Hive on Spark 1,078 1,120 170% 147%
Spark-SQL 1,835 1,884 359% 316%

Shuffle Join
9 Billion rows
Group by
675 Million
rows
Table
Scan
8.6
Billion
rows
Table
Scan
6 Million
rows
Select
0 rows
Filter
675 Million
rows
400
1,078 1,120
826
453
756
1,884
1,835
Why are shuffles so
slow for Hive on Spark
and Spark-SQL
Hive on Tez 400 453
Hive on Spark 1,078 1,120 170% 147%
Spark-SQL 1,835 1,884 359% 316%

Performance comparison : Shuffle join cluster CPU utilization

Hive on Tez
query ends
here

Hive on
Spark query
ends here

Performance comparison : Primitive results summary

Performance comparison : Performance summary
Short running query+
ETL+
Large joins and aggregates+
Slower than Spark-SQL in Map joins
High GC
Instability
SQL support limited compared to Hive
Lack of sophisticated query optimizer
Efficient resource utilization+
Map join performance+
Large Joins
Outperforms Spark-SQL in large join+
Slower than Tez for large joins and aggregates
High GC
Hive Tez
Spark-SQL
Hive on Spark
MapReduce
Promising initial release+

Solving Hive’s Top Performance Challenges

Apache Hive: Modern ArchitectureStorage
Columnar Storage
ORCFile Parquet
Unstructured Data
JSON CSV
Text Avro
Custom
Weblog
Engine
SQL Engines
Row Engine Vector Engine
SQL
SQL Support
SQL:2011 Optimizer HCatalog HiveServer2
Cache
Block Cache
Linux Cache
Distributed
Execution
Hadoop 1
MapReduce
Hadoop 2
Tez Spark
Vector Cache
LLAP
Persistent Server
Historical
Current
In Development
Legend

Storage
Columnar Storage
ORCFile Parquet
Unstructured Data
JSON CSV
Text Avro
Custom
Weblog
Engine
SQL Engines
SQL
SQL Support
SQL:2011 Optimizer HCatalog HiveServer2
Apache Hive: Getting to Sub-Second Improvement
LLAP: Persistent servers
cache vectors and start
queries instantly.
Pluggable integrations
with Tez or Spark.
Cache
Block Cache
Linux Cache
Distributed
Execution
Hadoop 1
MapReduce
Hadoop 2
Tez Spark
Historical
Current
In Development
Legend
Vector Cache
LLAP
Persistent Server

Storage
Columnar Storage
ORCFile Parquet
Unstructured Data
JSON CSV
Text Avro
Custom
Weblog
Engine
SQL Engines
SQL
SQL Support
SQL:2011 Optimizer HCatalog HiveServer2 Vectorized Hash
Join Solves CPU
Boundedness for
Hive on Tez or on
Spark.
Cache
Block Cache
Linux Cache
Distributed
Execution
Hadoop 1
MapReduce
Hadoop 2
Tez Spark
Historical
Current
In Development
Legend
Vector Cache
LLAP
Persistent Server

Storage
Columnar Storage
ORCFile Parquet
Unstructured Data
JSON CSV
Text Avro
Custom
Weblog
Engine
SQL Engines
SQL
SQL Support
SQL:2011 Optimizer HCatalog HiveServer2 Improved metadata
catalog allows instant
query planning and
optimization for any
engine.
Cache
Block Cache
Linux Cache
Distributed
Execution
Hadoop 1
MapReduce
Hadoop 2
Tez Spark
Historical
Current
In Development
Legend
Vector Cache
LLAP
Persistent Server

Apache Hive’s Sub-Second Future
=
Sub-Second
Hive
Metadata
Fast,
Scalable
Metadata
Catalog
Persistent
Server
LLAP
+ +
SQL Engine
Vectorized
Hash Join
Choice of
Execution
Engines
Tez or
Spark
+

Questions?
?
Interested? Stop by the Hortonworks booth to learn more

Endnotes
(1) https://code.facebook.com/posts/229861827208629/scaling-the-facebook-data-warehouse-to-300-pb/
(2) https://www.facebook.com/notes/facebook-engineering/under-the-hood-scheduling-mapreduce-jobs-more-efficiently-
with-corona/10151142560538920
(3) http://hortonworks.com/blog/benchmarking-apache-hive-13-enterprise-hadoop/
(4) http://yahoodevelopers.tumblr.com/post/85930551108/yahoo-betting-on-apache-hive-tez-and-yarn
(5) http://www.slideshare.net/AdamKawa/a-perfect-hive-query-for-a-perfect-meeting-hadoop-summit-2014

Hive on spark is blazing fast or is it final

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (6)

Similar to Hive on spark is blazing fast or is it final

Similar to Hive on spark is blazing fast or is it final (20)

More from Hortonworks

More from Hortonworks (20)

Recently uploaded

Recently uploaded (20)

Hive on spark is blazing fast or is it final