Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Yarn Resource Management Using Machine Learning

402 views

Published on

HadoopCon 2016 In Taiwan - How to maximum the utilization of Hadoop computing power is the biggest challenge for Hadoop administer. In this talk I will explain how we use Machine Learning to build the prediction model for the computing power requirements and setting up the MapReduce scheduler parameters dynamically, to fully utilize our Hadoop cluster computing power.

Published in: Technology
  • Be the first to comment

Yarn Resource Management Using Machine Learning

  1. 1. YARN Resource Management Using Machine Learning TrendMicro 劉一正 Tony Liu
  2. 2. About Me •  劉一正 Tony Liu •  TrendMicro Staff Engineer •  Big Data platform Administrator •  TSMC Big Data Consultant Project •  Keep improving Big Data platform •  tony_liu@trend.com.tw; ojavajava@gmail.com
  3. 3. Agenda •  Questions About YARN •  The ways to find the answers •  YARN resource consumption prediction •  Conclusion
  4. 4. Questions about YARN YARN Fair Scheduler What is the proper setting for container What is the characteristics of jobs run in the cluster How to properly allocate resource to queues Why cluster has resources, but still has pending jobs
  5. 5. The ways to find the answers •  Appropriate configurations for Container •  CPU bound / IO bound •  Queue resource consumption in the cluster •  Predict and allocate resources Container SeAing Job Characteristics Proper Allocate Resource to Queue Resource Prediction
  6. 6. My Thinking Container SeAing Job CPU / IO bound •  Correct container seAing •  What’s the primary constraints •  Number of containers in the cluster •  Memory calculation Queue Status •  Queue status in the cluster •  Allocate resource by Job SLA •  Pending Job and Unused resource in queue •  BoAleneck resource Prediction •  Classify Job type: CPU bound or IO bound •  Predict resource consumption •  Allocate unused resource to queue according to job type
  7. 7. Appropriate configurations for Container •  Appropriate configurations for Container •  CPU bound / IO bound •  Queue resource consumption in the cluster •  Predict and allocate resource Container SeAing Job Characteristics Proper Allocate Resource to Queue Resource Prediction
  8. 8. Appropriate configurations for Container Container •  Total available resource - Available vmems: total memory – reserved memory - Available vcores: total cpu – reserved cpu •  Number of YARN containers - concurrent processing min(vcores, 2 * Disks) •  RAM per container max(2G, total available mem / number of containers) * reserved: for system and HBase YARN Container Node Manager Scheduler Map Reduce AM
  9. 9. Appropriate configurations for Container •  yarn.nodemanager.resource.memory-mb = containers * RAM per container = total available vmems •  yarn.nodemanager.resource.cpu-vcores = total cores – reserved cores = total available vcores YARN NodeManager Resource YARN Container Node Manager Scheduler Map Reduce AM
  10. 10. Appropriate configurations for Container •  yarn.scheduler.minimum-allocation-mb = RAM per container •  yarn.scheduler.maximum-allocation-mb = containers * RAM per container •  yarn.scheduler.minimum-allocation-vcores = 1 •  yarn.scheduler.maximum-allocation-vcores = total available cores YARN Scheduler YARN Container Node Manager Scheduler Map Reduce AM
  11. 11. Appropriate configurations for Container •  mapreduce.map.memory.mb = RAM per container •  mapreduce.map.java.opts = 0.8 * RAM per container •  mapreduce.map.cpu.vcores = 1 •  mapreduce.map.disk = 0.5 Map YARN Container Node Manager Scheduler Map Reduce AM
  12. 12. Appropriate configurations for Container •  mapreduce.reduce.memory.mb = 2 * RAM per container •  mapreduce.reduce.java.opts = 0.8 * ( 2 * RAM per container) •  mapreduce.reduce.cpu.vcores = 1 •  mapreduce.reduce.disk = 1.33 Reduce YARN Container Node Manager Scheduler Map Reduce AM
  13. 13. Appropriate configurations for Container •  yarn.app.mapreduce.am.resource. mb = 2 * RAM per container •  yarn.app.mapreduce.am.command-opts = 0.8 * ( 2 * RAM per container) •  yarn.app.mapreduce.am.resource.cpu-vcores = 1 AM YARN Container Node Manager Scheduler Map Reduce AM
  14. 14. Container Size – Memory Calculation r = Requested memory The logic works like below: a. Take max of(requested resource and minimum resource) = max(768, 512) = 768 b. roundup(768, StepFactor) = roundUp (768, 512) == 1279 (Approximately) Roundup does : ((768 + (512 -1)) / 512) * 512 c. min(roundup(512, stepFactor), maximumresource) = min(1279, 1024) = 1024 So finally, the alloAed memory is 1024 MB, which is what you are geAing.
  15. 15. Container Size – Memory Calculation Map Container Map Task Map Container Map asking 1500 MB memory per map container mapreduce.map.memory.mb = 1500 yarn.scheduler.minimum-allocation-mb = 1024 RM will allocate 2048 MB container 2 * yarn.scheduler.minimum-allocation-mb
  16. 16. How Many Containers Launch •  Map split (HDFS block size) Input file Map Container Map Task Reducer Container Application Master Container Map Task Map Task Map Task Map Container Map Container Map Container •  Data locality (data located, rack located, any other NM) •  Application Master will re-aAempt tasks •  4 times fail task fail •  Require resource from Resource Manager •  AM stops sending heartbeats, RM will re-aAempt •  2 times fail whole application fail •  mapred.job.reduces parameter Reducer Task •  Reducers can be given resources before all the map tasks complete mapreduce.job.reduce.slowstart.completedmaps •  Wasting resources on process that are waiting for work •  Potentially creating a deadlock when resources are constrained in a shared environment
  17. 17. Observe the configuration •  Observe which configuration is best for you through TeraGen and TeraSort •  hadoop jar $HADOOP_PATH/hadoop-examples.jar teragen -Dmapreduce.job.maps=$i -Dmapreduce.map.memory.mb=$k -Dmapreduce.map.java.opts.max.heap=$MAP_MB •  hadoop jar $HADOOP_PATH/hadoop-examples.jar terasort -Dmapreduce.job.maps=$i -Dmapreduce.job.reduces=$j -Dmapreduce.map.memory.mb=$k -Dmapreduce.map.java.opts.max.heap=$MAP_MB -Dmapreduce.reduce.memory.mb=$k -Dmapreduce.reduce.java.opts.max.heap=$RED_MB
  18. 18. Container Resource Requirement Testing •  Appropriate configurations for Container •  CPU bound / IO bound •  Queue resource consumption in the cluster •  Predict and allocate resource Container SeAing Job Characteristics Proper Allocate Resource to Queue Resource Prediction
  19. 19. Job Characteristics •  Container is the basic unit of processing capacity in YARN, and is an encapsulation of resource elements (memory, cpu etc.). •  Different jobs make different workloads on the cluster, including the CPU-bound and I/O-bound •  So, what is the characteristics of the jobs running in the cluster ?
  20. 20. Job Characteristics •  Reference Tian et al., 2009 investigate the characteristic of MapReduce jobs in a practical data center •  Define a classification model to classify MapReduce jobs is belong to CPU-bound or I/O-bound
  21. 21. Job Characteristics •  In the Map-Shuffle phase does five actions: 1) init input data 2) compute map task 3) store output result to local disk 4) shuffle map tasks result data 5) shuffle reduce input data in
  22. 22. Job Characteristics •  According to the utilization of I/O and CPU, classification of workloads on the Map-Shuffle phase of MapReduce •  MID: map input data •  MOD: map output data •  SOD: Shuffle out data (=MOD) •  SID: Shuffle in data •  MTCT: Map task completed time •  DIOR: Disk I/O Rate(DFSIO I/O Rate) •  n: Number of YARN containers(concurrent processing)
  23. 23. Job Characteristics •  •  CPU-Bound •  I/O-Bound •  DIOR: DFSIO
  24. 24. Job Characteristics Program MID MOD MTCT myspn_top_cve 1395184 620928 15185 myspn_top_url 54481169 52528135 9867 aggregate_url 286007534 1155960828 420225 USandbox Data Statistic 37612436 4921787 45423 file-solr-daily 75167686 4660452644 224488 aggregate_url_de dupe 639896245 561632270 73926 myspn_top_url_b y_origin 499348380 506962079 53927 •  Data source: Job history log
  25. 25. Job Characteristics •  Data source: Job history log •  Test data set: 5,942 •  Test mode: split 66% train, remainder test •  Classifier model: RandomForest •  Attributes: MID, MOD, MTCT, n, dior, lable === Summary === Correlation coefficient 0.9934 Mean absolute error 0.0099 Root mean squared error 0.0513 Relative absolute error 2.4872 % Root relative squared error 11.4997 % Total Number of Instances 2020
  26. 26. Job Characteristics 0 200 400 600 800 1000 1200 IO Bound CPU Bound Queue Name Numbers of jobs
  27. 27. Queue Type I/O Bound domain_census myrep pathcensus CPU Bound alps census census-oozie data_importer domain_census- oozie domain_census_ ews hdfs magicQ myspn platinum platinum-oozie retroscan retrosplunk rnu spnungle threatconnect threathub threathub-oozie user
  28. 28. Thinking •  Besides base on the job’s SLA to allocate resource, what factors should I consider too? - Job Characteristics? - Queue type?
  29. 29. Queue Resource Consumption •  Appropriate configurations for Container •  CPU bound / IO bound •  Queue resource consumption in the cluster •  Predict and allocate resource Container SeAing Job Characteristics Proper Allocate Resource to Queue Resource Prediction
  30. 30. Cluster Resource Allocation •  YARN fair scheduler - yarn.scheduler.fair.allocation.file fair-scheduler.xml •  The allocation file is reloaded every 10 seconds, allowing changes to be made on the fly.
  31. 31. Cluster Resource Allocation •  Fair Scheduler - default queue: root - Hierarchical queues - placement policy - preemption - resource reserved •  Cluster resource - FairShare memory: x, vcores: y
  32. 32. Cluster Resource Allocation •  Queue Properties - minResources (soft limit) - maxResources (hard limit) - weight weight1.0/weight - maxRunningApps - schedulingPolicy YARN Research Production Service Marketing Report adhoc •  fifo •  fair •  drf Queues
  33. 33. Analysis Cluster Status •  Retrieve YARN metrics from YARN REST APIs •  FileSystemCounter •  JobCounters •  Task Counters
  34. 34. Pending apps and Available Vcore0320 0335 0350 0405 0420 0435 0450 0505 0520 0535 0550 0605 0620 0635 0650 0705 0720 0735 0750 0805 0820 0835 0850 0905 0920 0935 0950 1005 1020 1035 1050 appsPending availableVCores Time 100 % 50 % 0% Vcore
  35. 35. Vcores Utilization0320 0335 0350 0405 0420 0435 0450 0505 0520 0535 0550 0605 0620 0635 0650 0705 0720 0735 0750 0805 0820 0835 0850 0905 0920 0935 0950 1005 1020 1035 1050 total_vCores used_vCores 100 % 50 % 0% Vcore Time
  36. 36. Vmemory Utilization0320 0335 0350 0405 0420 0435 0450 0505 0520 0535 0550 0605 0620 0635 0650 0705 0720 0735 0750 0805 0820 0835 0850 0905 0920 0935 0950 1005 1020 1035 1050 used_memory total_memory 100 % 50 % Vmemory Time 0%
  37. 37. Cluster Resource Utilization Queue
  38. 38. Cluster Resource Utilization Queue
  39. 39. Cluster Resource Utilization Queue
  40. 40. BoAleneck Resource •  Vcores becomes bottleneck resource Memory Usage: 41.5% VCores Usage: 99.5%
  41. 41. Over Fair Share •  Cluster still has resources
  42. 42. Over Fair Share
  43. 43. Thinking •  Why cluster’s resource can’t be fully utilized? •  Is there any resource limitation? (bottleneck) •  How to reduce pending jobs when cluster still has resource?
  44. 44. Thinking •  Is it possible to predict when will has pending job in the cluster? •  Can I predict the resource consumption at specific time and dynamic allocate to fully utilize cluster resource?
  45. 45. Predict Resource Consumption And Allocate Resource •  Appropriate configurations for Container •  CPU bound / IO bound •  Queue resource consumption in the cluster •  Predict and allocate resource Container Size Job Characteristics Proper Allocate Resource to Queue Resource Prediction
  46. 46. YARN resource consumption prediction Collect Metrics Data Processing Training Model Pre-procession Training Model Evaluate RMSE Model Prediction Prediction Queue Consumption
  47. 47. Training Data Fields Description Process date date Ignore time hour: 0 ~ 23 feature working day 0: working day 1: non-working day feature weekday week day feature cluster_appsPending Pending apps in the cluster feature cluster_appsRunning Running apps in the cluster feature cluster_availableMB Available vmem in the cluster feature cluster_allocatedMB Allocated vmem in the cluster feature cluster_availableVcore Available vcore in the cluster feature cluster_allocatedVcore Allocated vcore in the cluster feature •  Data source: Job history log
  48. 48. Training Data Fields Description Process queue_name Queue name feature minResources_memory Min vmem for queue feature minResources_vcores Min vcore for queue feature maxResources_memory Max vmem for queue feature maxResources_vcores Max vcore for queue feature numPendingApps Pending apps in queue feature numActiveApps Running apps in queue feature usedResources.memory Used vmem in queue feature usedResources.vcore Used vcore in queue feature label label (predict target) label
  49. 49. Training Model •  Training Model: Linear Regression •  Predict: vcore
  50. 50. Training Model •  Training model: RandomForest •  Predict: vcore •  Data source: Job history log •  Test data set: 109,736 •  Test mode: split 66% train, remainder test •  Attributes: 19 === Summary === Correlation coefficient 0.999 Mean absolute error 0.1262 Root mean squared error 0.8494 Relative absolute error 1.5905 % Root relative squared error 4.5017 % Total Number of Instances 37,310
  51. 51. Training Model •  Training Model: Linear Regression •  Predict: vmemory
  52. 52. Training Model •  Training model: RandomForest •  Predict: vmemory •  Data source: Job history log •  Test data set: 109,736 •  Test mode: split 66% train, remainder test •  Attributes: 19 === Summary === Correlation coefficient 0.9995 Mean absolute error 0.0003 Root mean squared error 0.0019 Relative absolute error 1.4174 % Root relative squared error 3.2014 % Total Number of Instances 37,310
  53. 53. Training Model •  Training model: RandomForest •  Predict: Pending job •  Data source: Job history log •  Test data set: 122,120 •  Test mode: split 66% train, remainder test •  Attributes: 19 === Summary === Correlation coefficient 0.9917 Mean absolute error 0.0002 Root mean squared error 0.0054 Relative absolute error 7.9308 % Root relative squared error 14.4934 % Total Number of Instances 41,521
  54. 54. AAribute Evaluation •  Predict: Pending jobs •  Attribute Evaluator: Information Gain •  Ranked attributes : ABribute Score maxResource_memory 1.14465 maxResource_vcore 1.04186 usedResource_memory 0.53004 usedResource_vcore 0.51167 minResource_memory 0.47563 numActiveApps 0.34418 minResource_vcore 0.3179
  55. 55. Experiment Result •  According to the prediction result, we reallocate the resource of the queues which may has pending jobs on specific weekday. •  Experiment result: Pending jobs reduce 82% Pending jobs ratio Before 0.005 After 0.0009
  56. 56. Experiment Result •  Something you should know: - The total of queues’ minResources should less than the cluster fair share - Queue may not gets its minResources immediately - Preemption kills resources from other Queues to satisfy minResources, but also means waste resources
  57. 57. Experiment Result •  Something you should know: - Modify fair-scheduler.xml too frequently may cause ResourceManager weird - Failover ResourceManager will cause the jobs submit by oozie retry again - Does tight resource cluster need resource prediction?
  58. 58. Conclusion •  Deep understand the architecture is the key of tuning and management. •  Think about are there any other tools good for my daily job? Even from different domain. •  Machine Learning has been used on many domains for prediction, it definitely can provide you different perspective.
  59. 59. Q A
  60. 60. Thank You

×