48a tuning

1,348 views

Published on

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,348
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
0
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

48a tuning

  1. 1. Tuning MapReduce 7/6/2012© 2012 MapR Technologies Tuning 1
  2. 2. Tuning MapReduce Agenda • Tuning MapReduce • ExpressLane™ • Label-Based Scheduling© 2012 MapR Technologies Tuning 2
  3. 3. Tuning MapReduce Objectives At the end of this module you will be able to: • Effectively tune your MapReduce jobs • Explain how ExpressLane works and what jobs it applies to in your cluster • Configure label-based scheduling© 2012 MapR Technologies Tuning 3
  4. 4. Tuning MapReduce© 2012 MapR Technologies Tuning 4
  5. 5. Important Parameters  Number of task slots per node  Number of task slots on the cluster  Memory buffer size  JVM size  Speculative execution© 2012 MapR Technologies Tuning 5
  6. 6. Number of Task Slots per Node  Number of concurrent map and reduce tasks on a node  In mapred-site.xml – mapred.tasktracker.map.tasks.maximum – mapred.tasktracker.reduce.tasks.maximum  Recommendations: – Map slots: 0.75 * # of cores (minimum 1) – Reduce slots: 0.5 * # of cores (minimum 1)  Decrease map and reduce slots on CLDB nodes  Increase slots on nodes with more memory, disk, network bandwidth – E.g. reducers are bandwidth-intensive© 2012 MapR Technologies Tuning 6
  7. 7. Number of Task Slots on the Cluster  How many concurrent map and reduce tasks can run  In mapred-site.xml – mapred.map.tasks – mapred.reduce.tasks • Best parameter to tune© 2012 MapR Technologies Tuning 7
  8. 8. Memory Buffer Size  Memory used by map task for output during shuffle – io.sort.mb  Set to about 2x block size – Use hadoop mfs to see block size  If set too low, spills will result in lower performance – Visible in MapR Metrics© 2012 MapR Technologies Tuning 8
  9. 9. JVM Size  Size of child JVM that runs a map or reduce task – mapred.map.child.java.opts – set to about 2x io.sort.mb – mapred.reduce.child.java.opts – leave at default setting© 2012 MapR Technologies Tuning 9
  10. 10. Speculative Execution  Set to true: – mapred.map.tasks.speculative.execution – mapred.reduce.tasks.speculative.execution© 2012 MapR Technologies Tuning 10
  11. 11. ExpressLane™© 2012 MapR Technologies Tuning 11
  12. 12. ExpressLane™  Allow a small job to run when all slots are occupied  Only applies when cluster is busy and if job meets criteria specified in mapred-site.xml  Check the documentation for ExpressLane criteria – http://mapr.com/doc/display/MapR/ExpressLane  Note: jobs that fit the small job definition, but are in fact larger than anticipated are killed and re-queued for normal execution© 2012 MapR Technologies Tuning 12
  13. 13. Label-Based Scheduling© 2012 MapR Technologies Tuning 13
  14. 14. Label-Based Scheduling  Restrict job execution to a set of nodes within the cluster – By hardware config, department, etc.  Admin applies label(s) to nodes  User specifies label when submitting job  Admin can specify default/override label per queue© 2012 MapR Technologies Tuning 14
  15. 15. Label-Based Scheduling  On a jobtracker node in mapred-site.xml mapreduce.jobtracker.node.labels.file = <path to node-label mapping file> – Within the mapping file, each line uses the format <node pattern/regex> <labels> – Examples hadoop-prod-0* qa /hadoop-prod-1.*/ sales, product, 4_disks hadoop-prod-2 12_disks, engineering hadoop-prod-3 big_ram, support© 2012 MapR Technologies Tuning 15
  16. 16. Label-Based Scheduling  Specify a label when submitting a job in hadoop command line mapred.job.label = <label>© 2012 MapR Technologies Tuning 16
  17. 17. Label-Based Scheduling  Default label per queue – Examples mapred.queue.<queue-name>.label = <label> mapred.queue.<queue-name>.label.policy = <PREFER_QUEUE | PREFER_JOB | AND | OR>© 2012 MapR Technologies Tuning 17
  18. 18. Questions© 2012 MapR Technologies Tuning 18

×