Uploaded byLiyin Tang

PPTX, PDF6,015 views

Join optimization in hive

This document discusses optimization techniques for map join in Hive. It describes: 1) Previous approaches to common join and map join in Hive and their limitations. 2) Optimized map join techniques like uploading small tables to distributed cache and performing local joins to avoid shuffle. 3) Using JDBM for hash tables caused performance issues so alternative approaches were evaluated. 4) Automatically converting common joins to optimized map joins based on table sizes and joining conditional. 5) Compression and archiving of hash tables to distributed cache to reduce bandwidth overhead. 6) Performance evaluations showing improvements from the optimized techniques.

Join Optimization in HiveLiyin Tang

OutlineMap Join OptimizationPrevious Common Join and Map JoinOptimized Map JoinJDBMPerformance EvaluationConvert Join to Map Join AutomaticallyHow it worksPerformance Evaluation

Common JoinTask ATable XTable YCommon Join TaskMapperMapperMapper…Mapper…Mapper…MapperShuffleReducer

Previous Map JoinTask ASmall Table DataMapJoin TaskMapper……RecordMapper………Big Table DataRecordMapper………RecordRecordRecordTask C……

Optimized Map JoinSmall Table DataSmall Table DataSmall Table DataTask AUpload files to DCHashTable FilesHashTable FilesHashTable FilesMapReduce Local TaskDistributed CacheMapJoin TaskMapper…RecordMapper…RecordMapper…Big Table DataRecordRecord……Task C

JDBMJDBM is too heavy weight for Map JoinTake more than 70% CPU timeGenerate very large fileNo need to use persistent hashtable for map join

Performance Evaluation I

Converting Common Join into Map JoinTask ATask AConditional TaskaCommonJoinTaskMapJoinLocalTaskMapJoinLocalTaskMapJoinLocalTaskbCommonJoinTask. . . . . Task CcMapJoinTaskMapJoinTaskMapJoinTaskPrevious Execution Flow Task COptimized Execution Flow

Compile TimeSELECT * FROM SRC1 x JOIN SRC2 y ON x.key = y.key;Task AaConditional TaskAssume TABLE x is the big tableAssume TABLE y is the big tableMapJoinLocalTaskMapJoinLocalTaskCommonJoinTaskMapJoinTaskMapJoinTaskTask C

Execution TimeSELECT * FROM SRC1 x JOIN SRC2 y ON x.key = y.key;Task ABoth tables are too big for map joinTable X is the big tableaConditional TaskMapJoinLocalTaskCommonJoinTaskMapJoinTaskTask C

Backup TaskTask AConditional TaskMemory BoundMapJoinLocalTaskRun as a Backup TaskCommonJoinTaskMapJoinTaskTask C

Performance BottleneckDistributed Cache is the potential performance bottleneckLarge hashtable file will slow down the propagation of Distributed CacheMappers are waiting for the hashtables file from Distributed CacheCompress and archive all the hashtable file into a tar file.

Compress and Archive Small Table DataSmall Table DataSmall Table DataTask ACompressed & ArchivedaHashTable FilesHashTable FilesHashTable FilesMapReduce Local TaskDistributed CacheMapper…RecordMapper…RecordMapper…Big Table DataRecordbRecordMapJoin Task……Task C

Performance Evaluation II

Performance Evaluation III

Future WorkAudit how many join will be converted into map join in the cluster.Set hashtable file replica number based on the number of MappersTune the limit of small table data size by samplingIncrease the in-memory hashtable capacity.

Thank youLiyin Tang

Recommended

PPTX

Hive query optimization infinity

byShashwat Shriparv

PPTX

H base introduction & development

byShashwat Shriparv

PDF

Introduction to map reduce

byBhupesh Chawda

PPTX

Introduction to MapReduce

PPTX

Introduction to MapReduce

byChicago Hadoop Users Group

PDF

Map Reduce

byVigen Sahakyan

PPTX

Introduction to Map Reduce

PPT

Map Reduce

PPTX

Map reduce presentation

PPTX

Map Reduce

byRahul Agarwal

PPT

Hive

bySrinath Reddy

PPT

Introduction To Map Reduce

PPT

Map Reduce

byMichel Bruley

PPT

An Introduction To Map-Reduce

byFrancisco Pérez-Sorrosal

PPT

Hadoop Map Reduce

byVNIT-ACM Student Chapter

PDF

[Harvard CS264] 08b - MapReduce and Hadoop (Zak Stone, Harvard)

PDF

Introduction to Map-Reduce

byBrendan Tierney

PPTX

MapReduce basic

PPTX

Map reduce and Hadoop on windows

byMuhammad Shahid

PPT

Map Reduce

byManuel Correa

PPT

Hive Percona 2009

PPT

Hw09 Hadoop Development At Facebook Hive And Hdfs

byCloudera, Inc.

PPTX

Analysing of big data using map reduce

byPaladion Networks

PPT

Hadoop Summit 2009 Hive

PDF

Hadoop Map Reduce Arch

byJeff Hammerbacher

PPTX

Mastering Hadoop Map Reduce - Custom Types and Other Optimizations

PPT

Map Reduce

PPTX

Stratosphere with big_data_analytics

byAvinash Pandu

PPTX

Hive Correlation Optimizer

PDF

Hive contributors meetup apache sentry

More Related Content

PPTX

Hive query optimization infinity

byShashwat Shriparv

PPTX

H base introduction & development

byShashwat Shriparv

PDF

Introduction to map reduce

byBhupesh Chawda

PPTX

Introduction to MapReduce

PPTX

Introduction to MapReduce

byChicago Hadoop Users Group

PDF

Map Reduce

byVigen Sahakyan

PPTX

Introduction to Map Reduce

PPT

Map Reduce

Hive query optimization infinity

byShashwat Shriparv

H base introduction & development

byShashwat Shriparv

Introduction to map reduce

byBhupesh Chawda

Introduction to MapReduce

Introduction to MapReduce

byChicago Hadoop Users Group

Map Reduce

byVigen Sahakyan

Introduction to Map Reduce

Map Reduce

What's hot

PPTX

Map reduce presentation

PPTX

Map Reduce

byRahul Agarwal

PPT

Hive

bySrinath Reddy

PPT

Introduction To Map Reduce

PPT

Map Reduce

byMichel Bruley

PPT

An Introduction To Map-Reduce

byFrancisco Pérez-Sorrosal

PPT

Hadoop Map Reduce

byVNIT-ACM Student Chapter

PDF

[Harvard CS264] 08b - MapReduce and Hadoop (Zak Stone, Harvard)

PDF

Introduction to Map-Reduce

byBrendan Tierney

PPTX

MapReduce basic

PPTX

Map reduce and Hadoop on windows

byMuhammad Shahid

PPT

Map Reduce

byManuel Correa

PPT

Hive Percona 2009

PPT

Hw09 Hadoop Development At Facebook Hive And Hdfs

byCloudera, Inc.

PPTX

Analysing of big data using map reduce

byPaladion Networks

PPT

Hadoop Summit 2009 Hive

PDF

Hadoop Map Reduce Arch

byJeff Hammerbacher

PPTX

Mastering Hadoop Map Reduce - Custom Types and Other Optimizations

PPT

Map Reduce

PPTX

Stratosphere with big_data_analytics

byAvinash Pandu

Map reduce presentation

Map Reduce

byRahul Agarwal

Hive

bySrinath Reddy

Introduction To Map Reduce

Map Reduce

byMichel Bruley

An Introduction To Map-Reduce

byFrancisco Pérez-Sorrosal

Hadoop Map Reduce

byVNIT-ACM Student Chapter

[Harvard CS264] 08b - MapReduce and Hadoop (Zak Stone, Harvard)

Introduction to Map-Reduce

byBrendan Tierney

MapReduce basic

Map reduce and Hadoop on windows

byMuhammad Shahid

Map Reduce

byManuel Correa

Hive Percona 2009

Hw09 Hadoop Development At Facebook Hive And Hdfs

byCloudera, Inc.

Analysing of big data using map reduce

byPaladion Networks

Hadoop Summit 2009 Hive

Hadoop Map Reduce Arch

byJeff Hammerbacher

Mastering Hadoop Map Reduce - Custom Types and Other Optimizations

Map Reduce

Stratosphere with big_data_analytics

byAvinash Pandu

Viewers also liked

PPTX

Hive Correlation Optimizer

PDF

Hive contributors meetup apache sentry

PPTX

Ten tools for ten big data areas 04_Apache Hive

PDF

Hadoop World 2011: Replacing RDB/DW with Hadoop and Hive for Telco Big Data -...

byCloudera, Inc.

PDF

Optimizing Hive Queries

byDataWorks Summit

PPTX

Hive ppt (1)

PPT

Hive User Meeting August 2009 Facebook

PPTX

Jump Start with Apache Spark 2.0 on Databricks

PDF

Spark Summit Europe 2016 Keynote - Databricks CEO

PPTX

Internal Hive

byRecruit Technologies

PPTX

Apache Spark and Online Analytics

PPTX

How to understand and analyze Apache Hive query execution plan for performanc...

byDataWorks Summit/Hadoop Summit

PDF

Spark Summit EU 2016: The Next AMPLab: Real-time Intelligent Secure Execution

PPTX

A Deep Dive into Structured Streaming: Apache Spark Meetup at Bloomberg 2016

PDF

Spark Summit EU 2016 Keynote - Simplifying Big Data in Apache Spark 2.0

PPTX

Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5

byCloudera, Inc.

PDF

Hive tuning

byMichael Zhang

PDF

A look under the hood at Apache Spark's API and engine evolutions

PDF

Insights Without Tradeoffs: Using Structured Streaming

PDF

Spark Summit EU 2015: Spark DataFrames: Simple and Fast Analysis of Structure...

Hive Correlation Optimizer

Hive contributors meetup apache sentry

Ten tools for ten big data areas 04_Apache Hive

Hadoop World 2011: Replacing RDB/DW with Hadoop and Hive for Telco Big Data -...

byCloudera, Inc.

Optimizing Hive Queries

byDataWorks Summit

Hive ppt (1)

Hive User Meeting August 2009 Facebook

Jump Start with Apache Spark 2.0 on Databricks

Spark Summit Europe 2016 Keynote - Databricks CEO

Internal Hive

byRecruit Technologies

Apache Spark and Online Analytics

How to understand and analyze Apache Hive query execution plan for performanc...

byDataWorks Summit/Hadoop Summit

Spark Summit EU 2016: The Next AMPLab: Real-time Intelligent Secure Execution

A Deep Dive into Structured Streaming: Apache Spark Meetup at Bloomberg 2016

Spark Summit EU 2016 Keynote - Simplifying Big Data in Apache Spark 2.0

Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5

byCloudera, Inc.

Hive tuning

byMichael Zhang

A look under the hood at Apache Spark's API and engine evolutions

Insights Without Tradeoffs: Using Structured Streaming

Spark Summit EU 2015: Spark DataFrames: Simple and Fast Analysis of Structure...

Join optimization in hive

1.
Join Optimization inHiveLiyin Tang
2.
OutlineMap Join OptimizationPreviousCommon Join and Map JoinOptimized Map JoinJDBMPerformance EvaluationConvert Join to Map Join AutomaticallyHow it worksPerformance Evaluation
3.
Common JoinTask ATableXTable YCommon Join TaskMapperMapperMapper…Mapper…Mapper…MapperShuffleReducer
4.
Previous Map JoinTaskASmall Table DataMapJoin TaskMapper……RecordMapper………Big Table DataRecordMapper………RecordRecordRecordTask C……
5.
Optimized Map JoinSmallTable DataSmall Table DataSmall Table DataTask AUpload files to DCHashTable FilesHashTable FilesHashTable FilesMapReduce Local TaskDistributed CacheMapJoin TaskMapper…RecordMapper…RecordMapper…Big Table DataRecordRecord……Task C
6.
JDBMJDBM is tooheavy weight for Map JoinTake more than 70% CPU timeGenerate very large fileNo need to use persistent hashtable for map join
7.
Performance Evaluation I
8.
Converting Common Joininto Map JoinTask ATask AConditional TaskaCommonJoinTaskMapJoinLocalTaskMapJoinLocalTaskMapJoinLocalTaskbCommonJoinTask. . . . . Task CcMapJoinTaskMapJoinTaskMapJoinTaskPrevious Execution Flow Task COptimized Execution Flow
9.
Compile TimeSELECT *FROM SRC1 x JOIN SRC2 y ON x.key = y.key;Task AaConditional TaskAssume TABLE x is the big tableAssume TABLE y is the big tableMapJoinLocalTaskMapJoinLocalTaskCommonJoinTaskMapJoinTaskMapJoinTaskTask C
10.
Execution TimeSELECT *FROM SRC1 x JOIN SRC2 y ON x.key = y.key;Task ABoth tables are too big for map joinTable X is the big tableaConditional TaskMapJoinLocalTaskCommonJoinTaskMapJoinTaskTask C
11.
Backup TaskTask AConditionalTaskMemory BoundMapJoinLocalTaskRun as a Backup TaskCommonJoinTaskMapJoinTaskTask C
12.
Performance BottleneckDistributed Cacheis the potential performance bottleneckLarge hashtable file will slow down the propagation of Distributed CacheMappers are waiting for the hashtables file from Distributed CacheCompress and archive all the hashtable file into a tar file.
13.
Compress and ArchiveSmall Table DataSmall Table DataSmall Table DataTask ACompressed & ArchivedaHashTable FilesHashTable FilesHashTable FilesMapReduce Local TaskDistributed CacheMapper…RecordMapper…RecordMapper…Big Table DataRecordbRecordMapJoin Task……Task C
14.
Performance Evaluation II
15.
Performance Evaluation III
16.
Future WorkAudit howmany join will be converted into map join in the cluster.Set hashtable file replica number based on the number of MappersTune the limit of small table data size by samplingIncrease the in-memory hashtable capacity.
17.
Thank youLiyin Tang

Editor's Notes

#4 A common join in hive will involve a Map stage and a Reduce stageAs we all know, shuffle stage before reducer is expensive, they need to sort and merge the intermediate file. So we tried to avoid this stage whenever is possible.
#5 That’s the motivation of the map join.When one of the table is small enough to fit into the memory, so all the Mapper can hold the data in memory and do the join work in memory.So in this way, there is no shuffle/reduce stage is needed.That is how the previous map join works However the previous map join does not scaleThousands of Mapper read the small table from HDFS into memory, it will easily cause the small table to be the performance bottleneckAlso they will get read time out when the small table data became to be the hot spot. So that’s the problem of the previous map join
#6 I tried to solve this problemWe create a map reduce local task, which will run locally and read the small table data into memory and serialize the hashtable into files.After that, it will upload the files into Distributed CacheWhen the Map Join Task is launched, the DC will propagate the hashtable files to each mapper’s local file system.And each mapper will load them back into the memory and do the join work as before.By doing this, we need to read the small table data only once. Also we use the DC to push the data to Mapper, instead of pulling data from HDFS by mapper itself. The difference is if multiple mappers runs on the same machine, DC only needs to push the data once.
#7 Another optimization is to remove JDBM component from hive.JDBM is the persistent hash table used in Hive.Whenever the in-memory hashtable cannot hold data any more, it will swap the key/value into the JDBM table. It’s like a backup storageBy profiling, we found out JDBM is too heavy weight for map join.Take more than 70% CPU time when call the get function from JDBM.The generated Hashtable file is too large to propagate in DCActually, there is no need to use persistent hash table for map join. If the table is too large to fit in the memory, they should not run the query as a map join.Right now, we have totally removed this component from hive.
#8 We run several benchmark to how much performance improvement after optimization.The result of benchmark shows the new optimized map join will be 12~26 times faster the previous one.I have to mention that the performance improvement is not only because of introducing the Distributed Cache to propagate the hashtable file,But also we have optimized the map join code and remove a very heavy weight persistent hashtable component from Hive.
#9 since the new map join has a very good performance, Hive should try to run map join instead of common join whenever is possible.Previously, if user wants to do the map join, he needs to give the hints in query to assign which table is the small table 2) So the mapper can hold that data in the memory.3)But basically, not all of the users will givethis hint or users may gave a wrong hint in the query.4) So getting the hint from user is not good for user experience and query performance.6) So my work is to automatically and dynamically convert the common join into map join during the run time.7) Automatic means users don’t need to give the hint in the query any more.8) Dynamic means in the compile time, Hive will generate a series of execution flow, each of the execution flow covers one possible situation. (The number of execution path is bounded by the number of join tables.)12) During the execution, Hive will choose the most efficient execution path to run based on input file size.
#10 Let’s take an example: There are 2 tables join together on a join key. Let’s say table x and table y.So during the compile time, it will generate 3 execution paths. First execution path will do the map join by assuming table x is the big table and loading the other tables into memory, which is the table y.The second will also the do the map join by assuming table y is the big table.And finally, 3rd execution path assume both of the table is too large to fit into memory, not feasible for map join, it will run the original common join.Why we are doing this? It’s because we don’t the data size of join table during compile time. Some of the join table may be the intermediate tables, which is generated from some sub query at run time.We only know the data size during execution time.
#11 Let’s see what will happen during the execution time.When task A is finished, we exactly know the data size of each join table.If table X is the big table and the other table is small enough to fit into memory, Hive will run this execution path.However, if none of the tables can be load into memory, Hive will run the original common join execution path.By doing this, Hive can dynamically choose the most efficient execution path to run at execution stage.
#12 Since the local task needs to load the data into memory, it is a very memory intensive task.So hivewill launch this local task in a child jvm, which has the same heap size as the Mapper's.Right now, We have already carefully bounded the input data size of the small tables, but there is still possible that the local task may run out of memory.Sothe query processor will measure the memory usage of the local task very carefully. Once the memory usage of the Local Task is higher than a threshold, this Local Task will abort itself, which means the table is too large fit into memory and map join fails.In this case, the query processor will switch back the original Common Join task as a Backup Task to run, All of them is totally transparent to user.
#13 Let’s discuss the performance bottleneck. Previously, the small table file is the performance bottleneck,However, in the new map join, the distributed cache is the potential performance bottleneckIf you upload large hashtable file into DC, say larger than 30 M, it will really slow down the propagation speed of DC.You will see all the Mappers are launched but they will be in the initialization stage for a long time, which means they are waiting for the push from DC.Right now, the solution is very straightforward and walk around , we compress and archive all the hashtable files into tar file.
#15 We have run several performance benchmark to compare between with the compression and without compression.From the result, we can see the compression can help to improve the performance by 21 % ~ 86 %.Also we can see the larger the input data size is, the more performance we can get by compression, which is very reasonable.The larger the small table is, the large hashtable file it will generated.The larger the big table is, the more mappers will be launched.Both of these 2 factors will contributes to making the DC to be bottleneck problem.And compression can help to solve this situation.
#16 1) Finally,let’s see the performance comparison between previous common join with the new optimized common join, which is converted map join.2) Because all the tested benchmark is valid to convert into map join. 3) The result shows the join performance will be improved by 57% to 163%, if the join can be converted into map join.
#17 There are several future works to follow. First, when this new optimized Join is running in the cluster, it would be better to know how many join operation is converted in Map Join and how many converted Map Join fails because it runs out of memory.We have already developed the hooks to audit but still need time to deploy in the cluster.Another thing is to set up the number of replications for the compressed hashtable files based on the number of Mappers. Currently, the number of replications for thehashtable files is 3. If we can set the replication number based on number of mappers, it will improve the propagation speed.we manually bounded the table size of small table to be 25M, which may be a little conservative. If we can tune this parameters by sampling the data, we will get more accurate limit of map join and more queries can be convert into map join.Finally,the local task can hold 2M unique key/value in the memory by consuming 1.47G memory space.By optimization to be more memory efficient, the local task can hold more data in memory. So more join queries can be converted into Map Join.