More Related Content
Similar to 48a tuning (20)
More from mapr-academy (12)
48a tuning
- 2. Tuning MapReduce
Agenda
• Tuning MapReduce
• ExpressLane™
• Label-Based Scheduling
© 2012 MapR Technologies Tuning 2
- 3. Tuning MapReduce
Objectives
At the end of this module you will be able to:
• Effectively tune your MapReduce jobs
• Explain how ExpressLane works and what jobs it applies to in your cluster
• Configure label-based scheduling
© 2012 MapR Technologies Tuning 3
- 5. Important Parameters
Number of task slots per node
Number of task slots on the cluster
Memory buffer size
JVM size
Speculative execution
© 2012 MapR Technologies Tuning 5
- 6. Number of Task Slots per Node
Number of concurrent map and reduce tasks on a node
In mapred-site.xml
– mapred.tasktracker.map.tasks.maximum
– mapred.tasktracker.reduce.tasks.maximum
Recommendations:
– Map slots: 0.75 * # of cores (minimum 1)
– Reduce slots: 0.5 * # of cores (minimum 1)
Decrease map and reduce slots on CLDB nodes
Increase slots on nodes with more memory, disk, network
bandwidth
– E.g. reducers are bandwidth-intensive
© 2012 MapR Technologies Tuning 6
- 7. Number of Task Slots on the Cluster
How many concurrent map and reduce tasks can run
In mapred-site.xml
– mapred.map.tasks
– mapred.reduce.tasks
• Best parameter to tune
© 2012 MapR Technologies Tuning 7
- 8. Memory Buffer Size
Memory used by map task for output during shuffle
– io.sort.mb
Set to about 2x block size
– Use hadoop mfs to see block size
If set too low, spills will result in lower performance
– Visible in MapR Metrics
© 2012 MapR Technologies Tuning 8
- 9. JVM Size
Size of child JVM that runs a map or reduce task
– mapred.map.child.java.opts – set to about 2x io.sort.mb
– mapred.reduce.child.java.opts – leave at default setting
© 2012 MapR Technologies Tuning 9
- 10. Speculative Execution
Set to true:
– mapred.map.tasks.speculative.execution
– mapred.reduce.tasks.speculative.execution
© 2012 MapR Technologies Tuning 10
- 12. ExpressLane™
Allow a small job to run when all slots are occupied
Only applies when cluster is busy and if job meets criteria specified
in mapred-site.xml
Check the documentation for ExpressLane criteria
– http://mapr.com/doc/display/MapR/ExpressLane
Note: jobs that fit the small job definition, but are in fact larger
than anticipated are killed and re-queued for normal execution
© 2012 MapR Technologies Tuning 12
- 14. Label-Based Scheduling
Restrict job execution to a set of nodes within the cluster
– By hardware config, department, etc.
Admin applies label(s) to nodes
User specifies label when submitting job
Admin can specify default/override label per queue
© 2012 MapR Technologies Tuning 14
- 15. Label-Based Scheduling
On a jobtracker node in mapred-site.xml
mapreduce.jobtracker.node.labels.file =
<path to node-label mapping file>
– Within the mapping file, each line uses the format
<node pattern/regex> <labels>
– Examples
hadoop-prod-0* qa
/hadoop-prod-1.*/ sales, product, 4_disks
hadoop-prod-2 12_disks, engineering
hadoop-prod-3 big_ram, support
© 2012 MapR Technologies Tuning 15
- 16. Label-Based Scheduling
Specify a label when submitting a job in hadoop command line
mapred.job.label = <label>
© 2012 MapR Technologies Tuning 16
- 17. Label-Based Scheduling
Default label per queue
– Examples
mapred.queue.<queue-name>.label = <label>
mapred.queue.<queue-name>.label.policy = <PREFER_QUEUE |
PREFER_JOB | AND | OR>
© 2012 MapR Technologies Tuning 17