Join optimization in hive

•Download as PPTX, PDF•

14 likes•5,989 views

Liyin Tang

Outline Map Join Optimization Previous Common Join and Map Join Optimized Map Join JDBM Performance Evaluation Convert Join to Map Join Automatically How it works Performance Evaluation

Common Join Task A Table X Table Y Common Join Task Mapper Mapper Mapper … Mapper … Mapper … Mapper Shuffle Reducer

Previous Map Join Task A Small Table Data MapJoin Task Mapper … … Record Mapper … … … Big Table Data Record Mapper … … … Record Record Record Task C … …

Optimized Map Join Small Table Data Small Table Data Small Table Data Task A Upload files to DC HashTable Files HashTable Files HashTable Files MapReduce Local Task Distributed Cache MapJoin Task Mapper … Record Mapper … Record Mapper … Big Table Data Record Record … … Task C

JDBM JDBM is too heavy weight for Map Join Take more than 70% CPU time Generate very large file No need to use persistent hashtable for map join

Converting Common Join into Map Join Task A Task A Conditional Task a CommonJoinTask MapJoinLocalTask MapJoinLocalTask MapJoinLocalTask b CommonJoinTask . . . . . Task C c MapJoinTask MapJoinTask MapJoinTask Previous Execution Flow Task C Optimized Execution Flow

Compile Time SELECT * FROM SRC1 x JOIN SRC2 y ON x.key = y.key; Task A a Conditional Task Assume TABLE x is the big table Assume TABLE y is the big table MapJoinLocalTask MapJoinLocalTask CommonJoinTask MapJoinTask MapJoinTask Task C

Execution Time SELECT * FROM SRC1 x JOIN SRC2 y ON x.key = y.key; Task A Both tables are too big for map join Table X is the big table a Conditional Task MapJoinLocalTask CommonJoinTask MapJoinTask Task C

Backup Task Task A Conditional Task Memory Bound MapJoinLocalTask Run as a Backup Task CommonJoinTask MapJoinTask Task C

Performance Bottleneck Distributed Cache is the potential performance bottleneck Large hashtable file will slow down the propagation of Distributed Cache Mappers are waiting for the hashtables file from Distributed Cache Compress and archive all the hashtable file into a tar file.

Compress and Archive Small Table Data Small Table Data Small Table Data Task A Compressed & Archived a HashTable Files HashTable Files HashTable Files MapReduce Local Task Distributed Cache Mapper … Record Mapper … Record Mapper … Big Table Data Record b Record MapJoin Task … … Task C

Future Work Audit how many join will be converted into map join in the cluster. Set hashtable file replica number based on the number of Mappers Tune the limit of small table data size by sampling Increase the in-memory hashtable capacity.

What's hot

Map reduce presentationateeq ateeq

Map ReduceRahul Agarwal

HiveSrinath Reddy

Introduction To Map Reducerantav

Map ReduceMichel Bruley

An Introduction To Map-ReduceFrancisco Pérez-Sorrosal

Hadoop Map ReduceVNIT-ACM Student Chapter

[Harvard CS264] 08b - MapReduce and Hadoop (Zak Stone, Harvard)npinto

Introduction to Map-ReduceBrendan Tierney

MapReduce basicChirag Ahuja

Map reduce and Hadoop on windowsMuhammad Shahid

Map ReduceManuel Correa

Hive Percona 2009prasadc

Hw09 Hadoop Development At Facebook Hive And HdfsCloudera, Inc.

Analysing of big data using map reducePaladion Networks

Hadoop Summit 2009 HiveZheng Shao

Hadoop Map Reduce ArchJeff Hammerbacher

Mastering Hadoop Map Reduce - Custom Types and Other Optimizationsscottcrespo

Map Reduceschapht

Stratosphere with big_data_analyticsAvinash Pandu

What's hot (20)

Map reduce presentation

Map Reduce

Hive

Introduction To Map Reduce

Map Reduce

An Introduction To Map-Reduce

Hadoop Map Reduce

[Harvard CS264] 08b - MapReduce and Hadoop (Zak Stone, Harvard)

Introduction to Map-Reduce

MapReduce basic

Map reduce and Hadoop on windows

Map Reduce

Hive Percona 2009

Hw09 Hadoop Development At Facebook Hive And Hdfs

Analysing of big data using map reduce

Hadoop Summit 2009 Hive

Hadoop Map Reduce Arch

Mastering Hadoop Map Reduce - Custom Types and Other Optimizations

Map Reduce

Stratosphere with big_data_analytics

Viewers also liked

Hive Correlation OptimizerYin Huai

Hive contributors meetup apache sentryBrock Noland

Ten tools for ten big data areas 04_Apache HiveWill Du

Hadoop World 2011: Replacing RDB/DW with Hadoop and Hive for Telco Big Data -...Cloudera, Inc.

Optimizing Hive QueriesDataWorks Summit

Hive ppt (1)marwa baich

Hive User Meeting August 2009 Facebookragho

Jump Start with Apache Spark 2.0 on DatabricksDatabricks

Spark Summit Europe 2016 Keynote - Databricks CEO Databricks

Internal HiveRecruit Technologies

Apache Spark and Online Analytics Databricks

How to understand and analyze Apache Hive query execution plan for performanc...DataWorks Summit/Hadoop Summit

Spark Summit EU 2016: The Next AMPLab: Real-time Intelligent Secure ExecutionDatabricks

A Deep Dive into Structured Streaming: Apache Spark Meetup at Bloomberg 2016 Databricks

Spark Summit EU 2016 Keynote - Simplifying Big Data in Apache Spark 2.0Databricks

Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5Cloudera, Inc.

Hive tuningMichael Zhang

A look under the hood at Apache Spark's API and engine evolutionsDatabricks

Insights Without Tradeoffs: Using Structured StreamingDatabricks

Spark Summit EU 2015: Spark DataFrames: Simple and Fast Analysis of Structure...Databricks

Viewers also liked (20)

Hive Correlation Optimizer

Hive contributors meetup apache sentry

Ten tools for ten big data areas 04_Apache Hive

Hadoop World 2011: Replacing RDB/DW with Hadoop and Hive for Telco Big Data -...

Optimizing Hive Queries

Hive ppt (1)

Hive User Meeting August 2009 Facebook

Jump Start with Apache Spark 2.0 on Databricks

Spark Summit Europe 2016 Keynote - Databricks CEO

Internal Hive

Apache Spark and Online Analytics

How to understand and analyze Apache Hive query execution plan for performanc...

Spark Summit EU 2016: The Next AMPLab: Real-time Intelligent Secure Execution

A Deep Dive into Structured Streaming: Apache Spark Meetup at Bloomberg 2016

Spark Summit EU 2016 Keynote - Simplifying Big Data in Apache Spark 2.0

Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5

Hive tuning

A look under the hood at Apache Spark's API and engine evolutions

Insights Without Tradeoffs: Using Structured Streaming

Spark Summit EU 2015: Spark DataFrames: Simple and Fast Analysis of Structure...

Similar to Join optimization in hive

White paper hadoop performancetuningAnil Reddy

Map and ReduceChristopher Schleiden

Mapreduce advancedChirag Ahuja

Map ReducePrashant Gupta

Applying stratosphere for big data analyticsAvinash Pandu

mapreduce.pptxShimoFcis

Dache: A Data Aware Caching for Big-Data using Map Reduce frameworkSafir Shah

Hadoopdevakalyan143

MAP REDUCE IN DATA SCIENCE.pptxHARIKRISHNANU13

Introduction to map reduce s. jency jayastina II MSC COMPUTER SCIENCE BON SEC...jencyjayastina

Map Reduce Execution Architecture Rupak Roy

Advanced Hadoop Tuning and Optimization - Hadoop ConsultingImpetus Technologies

FinalprojectpresentationSANTOSH WAYAL

Map Reduce OnlineHadoop User Group

ch02-mapreduce.pptxGiannisPagges

Hadoop eco system with mapreduce hive and pigKhanKhaja1

MapReduceKavyaGo

Map reducefunnyslideletstalkbigdata

Hadoop 3shams03159691010

Hadoop 2EasyMedico.com

Similar to Join optimization in hive (20)

White paper hadoop performancetuning

Map and Reduce

Mapreduce advanced

Map Reduce

Applying stratosphere for big data analytics

mapreduce.pptx

Dache: A Data Aware Caching for Big-Data using Map Reduce framework

Hadoop

MAP REDUCE IN DATA SCIENCE.pptx

Introduction to map reduce s. jency jayastina II MSC COMPUTER SCIENCE BON SEC...

Map Reduce Execution Architecture

Advanced Hadoop Tuning and Optimization - Hadoop Consulting

Finalprojectpresentation

Map Reduce Online

ch02-mapreduce.pptx

Hadoop eco system with mapreduce hive and pig

MapReduce

Map reducefunnyslide

Hadoop 3

Hadoop 2

Join optimization in hive

1. Join Optimization in Hive Liyin Tang

2. Outline Map Join Optimization Previous Common Join and Map Join Optimized Map Join JDBM Performance Evaluation Convert Join to Map Join Automatically How it works Performance Evaluation

3. Common Join Task A Table X Table Y Common Join Task Mapper Mapper Mapper … Mapper … Mapper … Mapper Shuffle Reducer

4. Previous Map Join Task A Small Table Data MapJoin Task Mapper … … Record Mapper … … … Big Table Data Record Mapper … … … Record Record Record Task C … …

5. Optimized Map Join Small Table Data Small Table Data Small Table Data Task A Upload files to DC HashTable Files HashTable Files HashTable Files MapReduce Local Task Distributed Cache MapJoin Task Mapper … Record Mapper … Record Mapper … Big Table Data Record Record … … Task C

6. JDBM JDBM is too heavy weight for Map Join Take more than 70% CPU time Generate very large file No need to use persistent hashtable for map join

7. Performance Evaluation I

8. Converting Common Join into Map Join Task A Task A Conditional Task a CommonJoinTask MapJoinLocalTask MapJoinLocalTask MapJoinLocalTask b CommonJoinTask . . . . . Task C c MapJoinTask MapJoinTask MapJoinTask Previous Execution Flow Task C Optimized Execution Flow

9. Compile Time SELECT * FROM SRC1 x JOIN SRC2 y ON x.key = y.key; Task A a Conditional Task Assume TABLE x is the big table Assume TABLE y is the big table MapJoinLocalTask MapJoinLocalTask CommonJoinTask MapJoinTask MapJoinTask Task C

10. Execution Time SELECT * FROM SRC1 x JOIN SRC2 y ON x.key = y.key; Task A Both tables are too big for map join Table X is the big table a Conditional Task MapJoinLocalTask CommonJoinTask MapJoinTask Task C

11. Backup Task Task A Conditional Task Memory Bound MapJoinLocalTask Run as a Backup Task CommonJoinTask MapJoinTask Task C

12. Performance Bottleneck Distributed Cache is the potential performance bottleneck Large hashtable file will slow down the propagation of Distributed Cache Mappers are waiting for the hashtables file from Distributed Cache Compress and archive all the hashtable file into a tar file.

13. Compress and Archive Small Table Data Small Table Data Small Table Data Task A Compressed & Archived a HashTable Files HashTable Files HashTable Files MapReduce Local Task Distributed Cache Mapper … Record Mapper … Record Mapper … Big Table Data Record b Record MapJoin Task … … Task C

14. Performance Evaluation II

15. Performance Evaluation III

16. Future Work Audit how many join will be converted into map join in the cluster. Set hashtable file replica number based on the number of Mappers Tune the limit of small table data size by sampling Increase the in-memory hashtable capacity.

17. Thank you Liyin Tang

Editor's Notes

A common join in hive will involve a Map stage and a Reduce stageAs we all know, shuffle stage before reducer is expensive, they need to sort and merge the intermediate file. So we tried to avoid this stage whenever is possible.
That’s the motivation of the map join.When one of the table is small enough to fit into the memory, so all the Mapper can hold the data in memory and do the join work in memory.So in this way, there is no shuffle/reduce stage is needed.That is how the previous map join works However the previous map join does not scaleThousands of Mapper read the small table from HDFS into memory, it will easily cause the small table to be the performance bottleneckAlso they will get read time out when the small table data became to be the hot spot. So that’s the problem of the previous map join
I tried to solve this problemWe create a map reduce local task, which will run locally and read the small table data into memory and serialize the hashtable into files.After that, it will upload the files into Distributed CacheWhen the Map Join Task is launched, the DC will propagate the hashtable files to each mapper’s local file system.And each mapper will load them back into the memory and do the join work as before.By doing this, we need to read the small table data only once. Also we use the DC to push the data to Mapper, instead of pulling data from HDFS by mapper itself. The difference is if multiple mappers runs on the same machine, DC only needs to push the data once.
Another optimization is to remove JDBM component from hive.JDBM is the persistent hash table used in Hive.Whenever the in-memory hashtable cannot hold data any more, it will swap the key/value into the JDBM table. It’s like a backup storageBy profiling, we found out JDBM is too heavy weight for map join.Take more than 70% CPU time when call the get function from JDBM.The generated Hashtable file is too large to propagate in DCActually, there is no need to use persistent hash table for map join. If the table is too large to fit in the memory, they should not run the query as a map join.Right now, we have totally removed this component from hive.
We run several benchmark to how much performance improvement after optimization.The result of benchmark shows the new optimized map join will be 12~26 times faster the previous one.I have to mention that the performance improvement is not only because of introducing the Distributed Cache to propagate the hashtable file,But also we have optimized the map join code and remove a very heavy weight persistent hashtable component from Hive.
since the new map join has a very good performance, Hive should try to run map join instead of common join whenever is possible.Previously, if user wants to do the map join, he needs to give the hints in query to assign which table is the small table 2) So the mapper can hold that data in the memory.3)But basically, not all of the users will givethis hint or users may gave a wrong hint in the query.4) So getting the hint from user is not good for user experience and query performance.6) So my work is to automatically and dynamically convert the common join into map join during the run time.7) Automatic means users don’t need to give the hint in the query any more.8) Dynamic means in the compile time, Hive will generate a series of execution flow, each of the execution flow covers one possible situation. (The number of execution path is bounded by the number of join tables.)12) During the execution, Hive will choose the most efficient execution path to run based on input file size.
Let’s take an example: There are 2 tables join together on a join key. Let’s say table x and table y.So during the compile time, it will generate 3 execution paths. First execution path will do the map join by assuming table x is the big table and loading the other tables into memory, which is the table y.The second will also the do the map join by assuming table y is the big table.And finally, 3rd execution path assume both of the table is too large to fit into memory, not feasible for map join, it will run the original common join.Why we are doing this? It’s because we don’t the data size of join table during compile time. Some of the join table may be the intermediate tables, which is generated from some sub query at run time.We only know the data size during execution time.
Let’s see what will happen during the execution time.When task A is finished, we exactly know the data size of each join table.If table X is the big table and the other table is small enough to fit into memory, Hive will run this execution path.However, if none of the tables can be load into memory, Hive will run the original common join execution path.By doing this, Hive can dynamically choose the most efficient execution path to run at execution stage.
Since the local task needs to load the data into memory, it is a very memory intensive task.So hivewill launch this local task in a child jvm, which has the same heap size as the Mapper's.Right now, We have already carefully bounded the input data size of the small tables, but there is still possible that the local task may run out of memory.Sothe query processor will measure the memory usage of the local task very carefully. Once the memory usage of the Local Task is higher than a threshold, this Local Task will abort itself, which means the table is too large fit into memory and map join fails.In this case, the query processor will switch back the original Common Join task as a Backup Task to run, All of them is totally transparent to user.
Let’s discuss the performance bottleneck. Previously, the small table file is the performance bottleneck,However, in the new map join, the distributed cache is the potential performance bottleneckIf you upload large hashtable file into DC, say larger than 30 M, it will really slow down the propagation speed of DC.You will see all the Mappers are launched but they will be in the initialization stage for a long time, which means they are waiting for the push from DC.Right now, the solution is very straightforward and walk around , we compress and archive all the hashtable files into tar file.
We have run several performance benchmark to compare between with the compression and without compression.From the result, we can see the compression can help to improve the performance by 21 % ~ 86 %.Also we can see the larger the input data size is, the more performance we can get by compression, which is very reasonable.The larger the small table is, the large hashtable file it will generated.The larger the big table is, the more mappers will be launched.Both of these 2 factors will contributes to making the DC to be bottleneck problem.And compression can help to solve this situation.
1) Finally,let’s see the performance comparison between previous common join with the new optimized common join, which is converted map join.2) Because all the tested benchmark is valid to convert into map join. 3) The result shows the join performance will be improved by 57% to 163%, if the join can be converted into map join.
There are several future works to follow. First, when this new optimized Join is running in the cluster, it would be better to know how many join operation is converted in Map Join and how many converted Map Join fails because it runs out of memory.We have already developed the hooks to audit but still need time to deploy in the cluster.Another thing is to set up the number of replications for the compressed hashtable files based on the number of Mappers. Currently, the number of replications for thehashtable files is 3. If we can set the replication number based on number of mappers, it will improve the propagation speed.we manually bounded the table size of small table to be 25M, which may be a little conservative. If we can tune this parameters by sampling the data, we will get more accurate limit of map join and more queries can be convert into map join.Finally,the local task can hold 2M unique key/value in the memory by consuming 1.47G memory space.By optimization to be more memory efficient, the local task can hold more data in memory. So more join queries can be converted into Map Join.

Join optimization in hive

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Join optimization in hive

Similar to Join optimization in hive (20)

Join optimization in hive

Editor's Notes