SlideShare a Scribd company logo
1 of 33
Download to read offline
1
2
3
4
When buffer exceeds io.sort.spill.pct, a spill thread begins. The spill thread begins
with the start of the buffer and starts to spill keys and values to disk. If the buffer fills
up before spill is complete, it blocks the mapper until the spill is complete. The spill is
complete when the buffer is completely flushed. The mapper then continues to fill
the buffer until another spill begins. It loops like this until the mapper has emitted all
of its K,V pairs.
A larger value for io.sort.mb means more k,v pairs can fit in memory so you
experience fewer spills. Changing io.sort.spill.pct can give the spill thread more time
so you experience fewer blocks.
5
Another threshold parameter is io.sort.record.percent. The buffer is divided by this
fraction to leave room for accounting info that is required for each record. If the
accounting info room fills up, a spill begins. The amount of room required by
accounting info is a function of the number of records, not the record size. Therefore,
a higher number of records might need more room for accounting to reduce spill.
6
From MAPREDUCE-64.
The point here is that the buffer is actually a circular data structure with two parts:
the key/value index and the buffer. The key/value index is the “accounting info”.
MAPREDUCE-64 basically patches such that that sort.record.percent autotunes
instead of get manually set.
7
This is a diagram of a single spill. The result is a partitioned, possibly-combined spill
file sitting in one of the locations of mapred.local.dir on local disk.
This is a “hot path” in the code. Spills happen often and there are insertion points for
user/developer code: specifically the partitioner but more importantly the combiner
and most importantly the keycomparator and also the valuegroupingcomparator. If
you don’t include a combiner or you have an ineffective combiner, then you’re spilling
more data through the entire cycle. If your comparators are less than efficient, your
whole sort process slows.
8
This illustrates how a tasktracker’s mapred.local.dir might look towards the end of a
particular map task that is processing a large volume of data. Spill files are dumped to
disk round-robin to each directory specified by mapred.local.dir. Each spill file is
partitioned and sorted with the context of a single RAM-sized chunk of data.
Before those files can be served to the reducers, they have to be merged. But how do
you merge files that are already about as large as a buffer?
9
The good news is that it’s computationally very inexpensive to merge sorted sets to
produce a final sorted set. However, it is very IO intensive.
This slide illustrates the spill/merge cycle required to merge the multiple spill files
into a single output file ready to be served to the reducer. This example illustrates the
relationship between io.sort.factor (2 for illustration) and the number of merges. The
smaller io.sort.factor is, the more merges and spills are required, the more disk IO
you have, the slower your job runs. The larger it is, the more memory is required, but
the faster things go. A developer can tweak these settings per job, and it’s very
important to do so, because it directly affects the IO characteristics (and thus
performance) of your mapreduce job.
In real life, io.sort.factor defaults to 10, and this still leads to too many spills and
merges when data really scales. You can increase io.sort.factor to 100 or more on
large clusters or big data sets.
10
In this crude illustration, we’ve increased io.sort.factor to 3 from 2. In this case, we
cut the number of merges required to achieve the same result in half. This cuts down
the number of spills, the number of times the combiner is called, and one full pass
through the entire data set. As you can see, io.sort.factor is a very important
parameter!
11
Reducers obtain data from mappers via HTTP calls. Each HTTP connection has to be
serviced by an HTTP thread. The number of HTTP threads running on a task tracker
dictates the number of parallel reducers we can connect to. For illustration purposes
here, we set the value to 1 and watch all the other reducers queue up. This slows
things down.
12
Increasing the number of HTTP threads increases the amount of parallelism we can
achieve in the shuffle-sort phase, transferring data to the reducers.
13
14
Reducers obtain data from mappers via HTTP calls. Each HTTP connection has to be
serviced by an HTTP thread. The number of HTTP threads running on a task tracker
dictates the number of parallel reducers we can connect to. For illustration purposes
here, we set the value to 1 and watch all the other reducers queue up. This slows
things down.
15
The parallel copies configuration allows the readucer to retrieve map output from
multiple mappers out in the cluster in parallel.
If the reducer experiences a connection failure to a mapper, it tries again,
exponentially backing off in a loop until the value of mapred.reduce.copy.backoff is
exceeded. Then we timeout and fail that reducer.
16
“That which is written must be read”
In a very similar process to which map output is spilled and merged to create a final
output file for the mapper, the output from multiple mappers must be read, merged,
and spilled to create the input for the reduce function. The final reducer output is not
written to disk in the form of a spill file, but is rather passed to reduce() as a
parameter.
This means that if you have a mistake or a misconfiguration that is slowing you down
on the map side, the same exact configuration mistake is slowing you down double
on the reduce side. When you don’t have combiners in the mix that are reducing the
number of map outputs, this problem is compounded.
17
Suppose K is really a composite key that can be expanded into fields K1, K2…Kn For
the mapper, we set the SortComparator to respect ALL parts of that key.
However, for the reducer, we call a “grouping comparator” which only respects a
SUBSET of those keys. All keys being equal by this subset are sent to the same call to
reduce().
The result is that keys that are equal by the “grouping comparator” go to the same
call to “reduce” with their associated values, which have already been sorted by the
more precise key.
18
This slide illustrates the secondary sort process independently of the shuffle-sort. The
sortComparator orders every key/value set. The grouping comparator just determines
equivalence in terms of which calls to reduce() get which data elements. The cheat
here is that the grouping comparator has to respect the rules of the sort comparator.
It can only be less restrictive. In other words, values that appear equal to the group
comparator will go to the same call to reduce(). The value grouping does not actually
reorder any values.
19
In this crude illustration, we’ve increased io.sort.factor to 3 from 2. In this case, we
cut the number of merges required to achieve the same result in half. This cuts down
the number of spills, and one full pass through the entire data set. As you can see,
io.sort.factor is a very important parameter!
20
The size of the reducer buffer is specified by mapred.job.shuffle.input.buffer.pct in
terms of percent of the total heap allocated to the reduce task. When this buffer fills,
map inputs spill to disk and have to be merged later. The spill begins when the
mapred.shuffle.merge.pct threshhold is reached: this is specified in terms of the
precent of the input buffer size. You can increase this value to reduce the number of
trips to disk in the reduce() phase.
Another paramter to pay attention to is mapred.inmem.merge.threshhold. This is in
terms of the number of map input values. When this value is reached, we spill to disk.
If your mappers explode the data like wordcount does, consider setting this value to
zero.
21
In addition to being a little funny, the point here is that while there are a lot of
tunables to consider in Hadoop, you really only need to focus on a few at a time in
order to get optimum performance of any specific job.
Cluster administrators typically set default values for these tunables, but really these
are best guesses based on their understanding of Hadoop and of the jobs the users
will be submitting to the cluster. Any user can submit a job that cripples a cluster, and
in the interests of themselves and the other users, it behooves developer to
understand and override these configurations.
22
23
24
25
These numbers will grow with scale but the ratios will remain the same. Therefore,
you should be able to tune your mapreduce job on small data sets before unleashing
them on large data sets.
26
27
Start with a naïve implementation of wordcount with no combiner, and tune down
io.sort.mb and io.sort.factor to very small levels. Run with this setting on a very small
data set. Then run again on a data set twice the size. Now, tune up io.sort.mb and/or
io.sort.factor. Also play with mapred.inmem.merge.threshhold.
Now, add a combiner.
Now, tweak the wordcount to keep a local in-memory hash updated. This causes
more memory consumption in the mapper, but reduces the data set going into
combine() and also reduces the amount of data spilled.
One each run, note the counters. What works best for you?
28
29
30
31
32
33

More Related Content

What's hot

Hadoop secondary sort and a custom comparator
Hadoop secondary sort and a custom comparatorHadoop secondary sort and a custom comparator
Hadoop secondary sort and a custom comparatorSubhas Kumar Ghosh
 
Hive query optimization infinity
Hive query optimization infinityHive query optimization infinity
Hive query optimization infinityShashwat Shriparv
 
Managing Big data Module 3 (1st part)
Managing Big data Module 3 (1st part)Managing Big data Module 3 (1st part)
Managing Big data Module 3 (1st part)Soumee Maschatak
 
Mapreduce total order sorting technique
Mapreduce total order sorting techniqueMapreduce total order sorting technique
Mapreduce total order sorting techniqueUday Vakalapudi
 
Introduction to map reduce
Introduction to map reduceIntroduction to map reduce
Introduction to map reduceM Baddar
 
Big Data Processing: Performance Gain Through In-Memory Computation
Big Data Processing: Performance Gain Through In-Memory ComputationBig Data Processing: Performance Gain Through In-Memory Computation
Big Data Processing: Performance Gain Through In-Memory ComputationUT, San Antonio
 
Hadoop combiner and partitioner
Hadoop combiner and partitionerHadoop combiner and partitioner
Hadoop combiner and partitionerSubhas Kumar Ghosh
 
Introduction to map reduce
Introduction to map reduceIntroduction to map reduce
Introduction to map reduceBhupesh Chawda
 
Star Transformation, 12c Adaptive Bitmap Pruning and In-Memory option
Star Transformation, 12c Adaptive Bitmap Pruning and In-Memory optionStar Transformation, 12c Adaptive Bitmap Pruning and In-Memory option
Star Transformation, 12c Adaptive Bitmap Pruning and In-Memory optionFranck Pachot
 
Large Scale Data Analysis with Map/Reduce, part I
Large Scale Data Analysis with Map/Reduce, part ILarge Scale Data Analysis with Map/Reduce, part I
Large Scale Data Analysis with Map/Reduce, part IMarin Dimitrov
 
Relational Algebra and MapReduce
Relational Algebra and MapReduceRelational Algebra and MapReduce
Relational Algebra and MapReducePietro Michiardi
 
MapReduce: Distributed Computing for Machine Learning
MapReduce: Distributed Computing for Machine LearningMapReduce: Distributed Computing for Machine Learning
MapReduce: Distributed Computing for Machine Learningbutest
 
Reduce Side Joins
Reduce Side Joins Reduce Side Joins
Reduce Side Joins Edureka!
 
Mapreduce advanced
Mapreduce advancedMapreduce advanced
Mapreduce advancedChirag Ahuja
 
H base introduction & development
H base introduction & developmentH base introduction & development
H base introduction & developmentShashwat Shriparv
 

What's hot (20)

Hadoop secondary sort and a custom comparator
Hadoop secondary sort and a custom comparatorHadoop secondary sort and a custom comparator
Hadoop secondary sort and a custom comparator
 
Hive query optimization infinity
Hive query optimization infinityHive query optimization infinity
Hive query optimization infinity
 
Managing Big data Module 3 (1st part)
Managing Big data Module 3 (1st part)Managing Big data Module 3 (1st part)
Managing Big data Module 3 (1st part)
 
Mapreduce total order sorting technique
Mapreduce total order sorting techniqueMapreduce total order sorting technique
Mapreduce total order sorting technique
 
Introduction to map reduce
Introduction to map reduceIntroduction to map reduce
Introduction to map reduce
 
Hadoop Map Reduce Arch
Hadoop Map Reduce ArchHadoop Map Reduce Arch
Hadoop Map Reduce Arch
 
The google MapReduce
The google MapReduceThe google MapReduce
The google MapReduce
 
Big Data Processing: Performance Gain Through In-Memory Computation
Big Data Processing: Performance Gain Through In-Memory ComputationBig Data Processing: Performance Gain Through In-Memory Computation
Big Data Processing: Performance Gain Through In-Memory Computation
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
Hadoop combiner and partitioner
Hadoop combiner and partitionerHadoop combiner and partitioner
Hadoop combiner and partitioner
 
Hadoop
HadoopHadoop
Hadoop
 
Introduction to map reduce
Introduction to map reduceIntroduction to map reduce
Introduction to map reduce
 
Star Transformation, 12c Adaptive Bitmap Pruning and In-Memory option
Star Transformation, 12c Adaptive Bitmap Pruning and In-Memory optionStar Transformation, 12c Adaptive Bitmap Pruning and In-Memory option
Star Transformation, 12c Adaptive Bitmap Pruning and In-Memory option
 
Large Scale Data Analysis with Map/Reduce, part I
Large Scale Data Analysis with Map/Reduce, part ILarge Scale Data Analysis with Map/Reduce, part I
Large Scale Data Analysis with Map/Reduce, part I
 
Relational Algebra and MapReduce
Relational Algebra and MapReduceRelational Algebra and MapReduce
Relational Algebra and MapReduce
 
MapReduce: Distributed Computing for Machine Learning
MapReduce: Distributed Computing for Machine LearningMapReduce: Distributed Computing for Machine Learning
MapReduce: Distributed Computing for Machine Learning
 
Reduce Side Joins
Reduce Side Joins Reduce Side Joins
Reduce Side Joins
 
Mapreduce advanced
Mapreduce advancedMapreduce advanced
Mapreduce advanced
 
Report
ReportReport
Report
 
H base introduction & development
H base introduction & developmentH base introduction & development
H base introduction & development
 

Similar to Shuffle sort 101

[Cassandra summit Tokyo, 2015] Cassandra 2015 最新情報 by ジョナサン・エリス(Jonathan Ellis)
[Cassandra summit Tokyo, 2015] Cassandra 2015 最新情報 by ジョナサン・エリス(Jonathan Ellis)[Cassandra summit Tokyo, 2015] Cassandra 2015 最新情報 by ジョナサン・エリス(Jonathan Ellis)
[Cassandra summit Tokyo, 2015] Cassandra 2015 最新情報 by ジョナサン・エリス(Jonathan Ellis)datastaxjp
 
White paper hadoop performancetuning
White paper hadoop performancetuningWhite paper hadoop performancetuning
White paper hadoop performancetuningAnil Reddy
 
HadoopDB a major step towards a dead end
HadoopDB a major step towards a dead endHadoopDB a major step towards a dead end
HadoopDB a major step towards a dead endthkoch
 
Whitepaper: Exadata Consolidation Success Story
Whitepaper: Exadata Consolidation Success StoryWhitepaper: Exadata Consolidation Success Story
Whitepaper: Exadata Consolidation Success StoryKristofferson A
 
MongoDB Replication and Sharding
MongoDB Replication and ShardingMongoDB Replication and Sharding
MongoDB Replication and ShardingTharun Srinivasa
 
Big data skew
Big data skewBig data skew
Big data skewayan ray
 
Big data hadoop distributed file system for data
Big data hadoop distributed file system for dataBig data hadoop distributed file system for data
Big data hadoop distributed file system for datapreetik9044
 
Design patterns in MapReduce
Design patterns in MapReduceDesign patterns in MapReduce
Design patterns in MapReduceAkhilesh Joshi
 
2004 map reduce simplied data processing on large clusters (mapreduce)
2004 map reduce simplied data processing on large clusters (mapreduce)2004 map reduce simplied data processing on large clusters (mapreduce)
2004 map reduce simplied data processing on large clusters (mapreduce)anh tuan
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation HadoopVarun Narang
 
22827361 ab initio-fa-qs
22827361 ab initio-fa-qs22827361 ab initio-fa-qs
22827361 ab initio-fa-qsCapgemini
 

Similar to Shuffle sort 101 (20)

Hadoop Interview Questions and Answers
Hadoop Interview Questions and AnswersHadoop Interview Questions and Answers
Hadoop Interview Questions and Answers
 
[Cassandra summit Tokyo, 2015] Cassandra 2015 最新情報 by ジョナサン・エリス(Jonathan Ellis)
[Cassandra summit Tokyo, 2015] Cassandra 2015 最新情報 by ジョナサン・エリス(Jonathan Ellis)[Cassandra summit Tokyo, 2015] Cassandra 2015 最新情報 by ジョナサン・エリス(Jonathan Ellis)
[Cassandra summit Tokyo, 2015] Cassandra 2015 最新情報 by ジョナサン・エリス(Jonathan Ellis)
 
MapReduce-Notes.pdf
MapReduce-Notes.pdfMapReduce-Notes.pdf
MapReduce-Notes.pdf
 
White paper hadoop performancetuning
White paper hadoop performancetuningWhite paper hadoop performancetuning
White paper hadoop performancetuning
 
HadoopDB a major step towards a dead end
HadoopDB a major step towards a dead endHadoopDB a major step towards a dead end
HadoopDB a major step towards a dead end
 
Whitepaper: Exadata Consolidation Success Story
Whitepaper: Exadata Consolidation Success StoryWhitepaper: Exadata Consolidation Success Story
Whitepaper: Exadata Consolidation Success Story
 
2 mapreduce-model-principles
2 mapreduce-model-principles2 mapreduce-model-principles
2 mapreduce-model-principles
 
Unit-2 Hadoop Framework.pdf
Unit-2 Hadoop Framework.pdfUnit-2 Hadoop Framework.pdf
Unit-2 Hadoop Framework.pdf
 
Unit-2 Hadoop Framework.pdf
Unit-2 Hadoop Framework.pdfUnit-2 Hadoop Framework.pdf
Unit-2 Hadoop Framework.pdf
 
MongoDB Replication and Sharding
MongoDB Replication and ShardingMongoDB Replication and Sharding
MongoDB Replication and Sharding
 
Big data skew
Big data skewBig data skew
Big data skew
 
Big data hadoop distributed file system for data
Big data hadoop distributed file system for dataBig data hadoop distributed file system for data
Big data hadoop distributed file system for data
 
lec6_ref.pdf
lec6_ref.pdflec6_ref.pdf
lec6_ref.pdf
 
Lecture 1 mapreduce
Lecture 1  mapreduceLecture 1  mapreduce
Lecture 1 mapreduce
 
Design patterns in MapReduce
Design patterns in MapReduceDesign patterns in MapReduce
Design patterns in MapReduce
 
Lecture 2 part 3
Lecture 2 part 3Lecture 2 part 3
Lecture 2 part 3
 
Map reduce
Map reduceMap reduce
Map reduce
 
2004 map reduce simplied data processing on large clusters (mapreduce)
2004 map reduce simplied data processing on large clusters (mapreduce)2004 map reduce simplied data processing on large clusters (mapreduce)
2004 map reduce simplied data processing on large clusters (mapreduce)
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
 
22827361 ab initio-fa-qs
22827361 ab initio-fa-qs22827361 ab initio-fa-qs
22827361 ab initio-fa-qs
 

Recently uploaded

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 

Recently uploaded (20)

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 

Shuffle sort 101

  • 1. 1
  • 2. 2
  • 3. 3
  • 4. 4
  • 5. When buffer exceeds io.sort.spill.pct, a spill thread begins. The spill thread begins with the start of the buffer and starts to spill keys and values to disk. If the buffer fills up before spill is complete, it blocks the mapper until the spill is complete. The spill is complete when the buffer is completely flushed. The mapper then continues to fill the buffer until another spill begins. It loops like this until the mapper has emitted all of its K,V pairs. A larger value for io.sort.mb means more k,v pairs can fit in memory so you experience fewer spills. Changing io.sort.spill.pct can give the spill thread more time so you experience fewer blocks. 5
  • 6. Another threshold parameter is io.sort.record.percent. The buffer is divided by this fraction to leave room for accounting info that is required for each record. If the accounting info room fills up, a spill begins. The amount of room required by accounting info is a function of the number of records, not the record size. Therefore, a higher number of records might need more room for accounting to reduce spill. 6
  • 7. From MAPREDUCE-64. The point here is that the buffer is actually a circular data structure with two parts: the key/value index and the buffer. The key/value index is the “accounting info”. MAPREDUCE-64 basically patches such that that sort.record.percent autotunes instead of get manually set. 7
  • 8. This is a diagram of a single spill. The result is a partitioned, possibly-combined spill file sitting in one of the locations of mapred.local.dir on local disk. This is a “hot path” in the code. Spills happen often and there are insertion points for user/developer code: specifically the partitioner but more importantly the combiner and most importantly the keycomparator and also the valuegroupingcomparator. If you don’t include a combiner or you have an ineffective combiner, then you’re spilling more data through the entire cycle. If your comparators are less than efficient, your whole sort process slows. 8
  • 9. This illustrates how a tasktracker’s mapred.local.dir might look towards the end of a particular map task that is processing a large volume of data. Spill files are dumped to disk round-robin to each directory specified by mapred.local.dir. Each spill file is partitioned and sorted with the context of a single RAM-sized chunk of data. Before those files can be served to the reducers, they have to be merged. But how do you merge files that are already about as large as a buffer? 9
  • 10. The good news is that it’s computationally very inexpensive to merge sorted sets to produce a final sorted set. However, it is very IO intensive. This slide illustrates the spill/merge cycle required to merge the multiple spill files into a single output file ready to be served to the reducer. This example illustrates the relationship between io.sort.factor (2 for illustration) and the number of merges. The smaller io.sort.factor is, the more merges and spills are required, the more disk IO you have, the slower your job runs. The larger it is, the more memory is required, but the faster things go. A developer can tweak these settings per job, and it’s very important to do so, because it directly affects the IO characteristics (and thus performance) of your mapreduce job. In real life, io.sort.factor defaults to 10, and this still leads to too many spills and merges when data really scales. You can increase io.sort.factor to 100 or more on large clusters or big data sets. 10
  • 11. In this crude illustration, we’ve increased io.sort.factor to 3 from 2. In this case, we cut the number of merges required to achieve the same result in half. This cuts down the number of spills, the number of times the combiner is called, and one full pass through the entire data set. As you can see, io.sort.factor is a very important parameter! 11
  • 12. Reducers obtain data from mappers via HTTP calls. Each HTTP connection has to be serviced by an HTTP thread. The number of HTTP threads running on a task tracker dictates the number of parallel reducers we can connect to. For illustration purposes here, we set the value to 1 and watch all the other reducers queue up. This slows things down. 12
  • 13. Increasing the number of HTTP threads increases the amount of parallelism we can achieve in the shuffle-sort phase, transferring data to the reducers. 13
  • 14. 14
  • 15. Reducers obtain data from mappers via HTTP calls. Each HTTP connection has to be serviced by an HTTP thread. The number of HTTP threads running on a task tracker dictates the number of parallel reducers we can connect to. For illustration purposes here, we set the value to 1 and watch all the other reducers queue up. This slows things down. 15
  • 16. The parallel copies configuration allows the readucer to retrieve map output from multiple mappers out in the cluster in parallel. If the reducer experiences a connection failure to a mapper, it tries again, exponentially backing off in a loop until the value of mapred.reduce.copy.backoff is exceeded. Then we timeout and fail that reducer. 16
  • 17. “That which is written must be read” In a very similar process to which map output is spilled and merged to create a final output file for the mapper, the output from multiple mappers must be read, merged, and spilled to create the input for the reduce function. The final reducer output is not written to disk in the form of a spill file, but is rather passed to reduce() as a parameter. This means that if you have a mistake or a misconfiguration that is slowing you down on the map side, the same exact configuration mistake is slowing you down double on the reduce side. When you don’t have combiners in the mix that are reducing the number of map outputs, this problem is compounded. 17
  • 18. Suppose K is really a composite key that can be expanded into fields K1, K2…Kn For the mapper, we set the SortComparator to respect ALL parts of that key. However, for the reducer, we call a “grouping comparator” which only respects a SUBSET of those keys. All keys being equal by this subset are sent to the same call to reduce(). The result is that keys that are equal by the “grouping comparator” go to the same call to “reduce” with their associated values, which have already been sorted by the more precise key. 18
  • 19. This slide illustrates the secondary sort process independently of the shuffle-sort. The sortComparator orders every key/value set. The grouping comparator just determines equivalence in terms of which calls to reduce() get which data elements. The cheat here is that the grouping comparator has to respect the rules of the sort comparator. It can only be less restrictive. In other words, values that appear equal to the group comparator will go to the same call to reduce(). The value grouping does not actually reorder any values. 19
  • 20. In this crude illustration, we’ve increased io.sort.factor to 3 from 2. In this case, we cut the number of merges required to achieve the same result in half. This cuts down the number of spills, and one full pass through the entire data set. As you can see, io.sort.factor is a very important parameter! 20
  • 21. The size of the reducer buffer is specified by mapred.job.shuffle.input.buffer.pct in terms of percent of the total heap allocated to the reduce task. When this buffer fills, map inputs spill to disk and have to be merged later. The spill begins when the mapred.shuffle.merge.pct threshhold is reached: this is specified in terms of the precent of the input buffer size. You can increase this value to reduce the number of trips to disk in the reduce() phase. Another paramter to pay attention to is mapred.inmem.merge.threshhold. This is in terms of the number of map input values. When this value is reached, we spill to disk. If your mappers explode the data like wordcount does, consider setting this value to zero. 21
  • 22. In addition to being a little funny, the point here is that while there are a lot of tunables to consider in Hadoop, you really only need to focus on a few at a time in order to get optimum performance of any specific job. Cluster administrators typically set default values for these tunables, but really these are best guesses based on their understanding of Hadoop and of the jobs the users will be submitting to the cluster. Any user can submit a job that cripples a cluster, and in the interests of themselves and the other users, it behooves developer to understand and override these configurations. 22
  • 23. 23
  • 24. 24
  • 25. 25
  • 26. These numbers will grow with scale but the ratios will remain the same. Therefore, you should be able to tune your mapreduce job on small data sets before unleashing them on large data sets. 26
  • 27. 27
  • 28. Start with a naïve implementation of wordcount with no combiner, and tune down io.sort.mb and io.sort.factor to very small levels. Run with this setting on a very small data set. Then run again on a data set twice the size. Now, tune up io.sort.mb and/or io.sort.factor. Also play with mapred.inmem.merge.threshhold. Now, add a combiner. Now, tweak the wordcount to keep a local in-memory hash updated. This causes more memory consumption in the mapper, but reduces the data set going into combine() and also reduces the amount of data spilled. One each run, note the counters. What works best for you? 28
  • 29. 29
  • 30. 30
  • 31. 31
  • 32. 32
  • 33. 33