• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
48a tuning
 

48a tuning

on

  • 731 views

 

Statistics

Views

Total Views
731
Views on SlideShare
731
Embed Views
0

Actions

Likes
1
Downloads
0
Comments
0

0 Embeds 0

No embeds

Accessibility

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    48a tuning 48a tuning Presentation Transcript

    • Tuning MapReduce 7/6/2012© 2012 MapR Technologies Tuning 1
    • Tuning MapReduce Agenda • Tuning MapReduce • ExpressLane™ • Label-Based Scheduling© 2012 MapR Technologies Tuning 2
    • Tuning MapReduce Objectives At the end of this module you will be able to: • Effectively tune your MapReduce jobs • Explain how ExpressLane works and what jobs it applies to in your cluster • Configure label-based scheduling© 2012 MapR Technologies Tuning 3
    • Tuning MapReduce© 2012 MapR Technologies Tuning 4
    • Important Parameters  Number of task slots per node  Number of task slots on the cluster  Memory buffer size  JVM size  Speculative execution© 2012 MapR Technologies Tuning 5
    • Number of Task Slots per Node  Number of concurrent map and reduce tasks on a node  In mapred-site.xml – mapred.tasktracker.map.tasks.maximum – mapred.tasktracker.reduce.tasks.maximum  Recommendations: – Map slots: 0.75 * # of cores (minimum 1) – Reduce slots: 0.5 * # of cores (minimum 1)  Decrease map and reduce slots on CLDB nodes  Increase slots on nodes with more memory, disk, network bandwidth – E.g. reducers are bandwidth-intensive© 2012 MapR Technologies Tuning 6
    • Number of Task Slots on the Cluster  How many concurrent map and reduce tasks can run  In mapred-site.xml – mapred.map.tasks – mapred.reduce.tasks • Best parameter to tune© 2012 MapR Technologies Tuning 7
    • Memory Buffer Size  Memory used by map task for output during shuffle – io.sort.mb  Set to about 2x block size – Use hadoop mfs to see block size  If set too low, spills will result in lower performance – Visible in MapR Metrics© 2012 MapR Technologies Tuning 8
    • JVM Size  Size of child JVM that runs a map or reduce task – mapred.map.child.java.opts – set to about 2x io.sort.mb – mapred.reduce.child.java.opts – leave at default setting© 2012 MapR Technologies Tuning 9
    • Speculative Execution  Set to true: – mapred.map.tasks.speculative.execution – mapred.reduce.tasks.speculative.execution© 2012 MapR Technologies Tuning 10
    • ExpressLane™© 2012 MapR Technologies Tuning 11
    • ExpressLane™  Allow a small job to run when all slots are occupied  Only applies when cluster is busy and if job meets criteria specified in mapred-site.xml  Check the documentation for ExpressLane criteria – http://mapr.com/doc/display/MapR/ExpressLane  Note: jobs that fit the small job definition, but are in fact larger than anticipated are killed and re-queued for normal execution© 2012 MapR Technologies Tuning 12
    • Label-Based Scheduling© 2012 MapR Technologies Tuning 13
    • Label-Based Scheduling  Restrict job execution to a set of nodes within the cluster – By hardware config, department, etc.  Admin applies label(s) to nodes  User specifies label when submitting job  Admin can specify default/override label per queue© 2012 MapR Technologies Tuning 14
    • Label-Based Scheduling  On a jobtracker node in mapred-site.xml mapreduce.jobtracker.node.labels.file = <path to node-label mapping file> – Within the mapping file, each line uses the format <node pattern/regex> <labels> – Examples hadoop-prod-0* qa /hadoop-prod-1.*/ sales, product, 4_disks hadoop-prod-2 12_disks, engineering hadoop-prod-3 big_ram, support© 2012 MapR Technologies Tuning 15
    • Label-Based Scheduling  Specify a label when submitting a job in hadoop command line mapred.job.label = <label>© 2012 MapR Technologies Tuning 16
    • Label-Based Scheduling  Default label per queue – Examples mapred.queue.<queue-name>.label = <label> mapred.queue.<queue-name>.label.policy = <PREFER_QUEUE | PREFER_JOB | AND | OR>© 2012 MapR Technologies Tuning 17
    • Questions© 2012 MapR Technologies Tuning 18