PStorM:     Profile Storage   and Matching forFeedback-Based Tuning  of MapReduce Jobs    MMath Thesis Presentation       ...
Outline   ● Hadoop MapReduce   ● Tuning Hadoop Configuration Parameters         ○ Rule-Based Approach         ○ Feedback-B...
The MapReduce   Programming Model                               P11 Input Split-1   Map-1                               P1...
Hadoop MapReduce   ● Hadoop is a Java open-source     implementation of the MapReduce model   ● Hadoop configuration param...
Hadoop Configuration Parameters                            P11 Input Split-1    Map-1                            P12      ...
Hadoop Configuration Parameters                            P11 Input Split-1    Map-1                                     ...
Hadoop Configuration Parameters                            P11 Input Split-1    Map-1                                     ...
Hadoop Configuration Parameters   ● Good setting of these parameters relies on:         ○ Behaviour of the map and reduce ...
Rule-Based Optimizer   ● Initial attempt is to capture the hadoop     admin. expertise into a set of <rule, action>     pa...
Rule-Based Optimizer (RBO)Dec 5, 2012    MMath Thesis Presentation   8
Feedback-Based Tuning Approach   ● Another attempt is to capture the effect of     the program complexity and the cluster ...
Feedback-Based Tuning ApproachDec 5, 2012   MMath Thesis Presentation   10
Starfish   ● Starfish is an automatic feedback-based     tuning system    First Submission    Subsequent    SubmissionsDec...
Starfish   ● Starfish execution profile:         ○ General: IO, CPU, Memory         ○ Domain specific: runtimes of every p...
Starfish   ● Starfish execution profile:         ○ General: IO, CPU, Memory         ○ Domain Specific: runtimes of every p...
Starfish   ● Starfish execution profile:         ○ General: IO, CPU, Memory         ○ Domain Specific: runtimes of every p...
Profile Reuse   ● MR jobs have a high likelihood to be similar:         ○ MR jobs are generated from a high level       la...
Profile Reuse   ● MR jobs have a high likelihood to be similar:         ○ MR jobs are generated from a high level       qu...
Profile Reuse Example   ● Bigram Relative Frequency MR job:         ○ Counts the frequency of a pair of subsequent words  ...
Profile Reuse ExampleDec 5, 2012    MMath Thesis Presentation   15
Challenge      Given a repository of execution profiles of        previously executed MR jobs, how to   automatically comp...
Outline   ● Hadoop MapReduce   ● Tuning Hadoop Configuration Parameters         ○ Rule-Based Approach         ○ Feedback-B...
PStorM: Profile Store and Matcher   ● PStorM goals:     ○ Extensible profile store     ○ Accurate profile matcher that reu...
System OverviewDec 5, 2012   MMath Thesis Presentation   19
Profile Matcher   ● Profile matching is a domain-specific pattern     recognition problem:        a. Feature selection    ...
Profile MatcherDec 5, 2012     MMath Thesis Presentation   21
Sample Profile   ● Dataflow fields (D):         ○ Number of input records to the map/reduce tasks   ● Cost fields (C):    ...
Feature Selection          Job   D              C                DS   CS   ● Q: Given a MapReduce job and its sample     p...
Feature Selection    First Submission    Subsequent    SubmissionsDec 5, 2012            MMath Thesis Presentation   24
Feature Selection          Job         D              C                DS   CS   ● Inputs to the analytical models:       ...
Feature Selection                Job             DS                CS   ● The DS and CS features are obtained from the    ...
Feature Selection                    Job             DS                CS   ● Dataflow statistics are expected to have thi...
Feature Selection               Job             DS                CS   ● CS features can vary between different     sample...
Feature Selection               Job             DS                CS   ● What are the features that can be extracted     f...
Feature Selection                     Job                DS                CS   ● Differences between MR jobs are         ...
Feature Selection                Job             DS                CS   ● We will refer to these features as the static   ...
Feature Selection                     Job             DS                CS   ● So far, the map/reduce functions are     an...
CFG Example       Word Co-occurrence Pairs                      Word CountDec 5, 2012              MMath Thesis Presentati...
CFG Example       Word Co-occurrence Pairs                      Word CountDec 5, 2012              MMath Thesis Presentati...
CFG Example       Word Co-occurrence Pairs                      Word Count          Different map CFGs => different map-ph...
Outline   ● Hadoop MapReduce   ● Tuning Hadoop Configuration Parameters         ○ Rule-Based Approach         ○ Feedback-B...
Similarity Measures              Static   CFG           DS                CS   ● Matching the static features:         ○ F...
Similarity Measures              Static    CFG            DS                CS   ● Matching CFGs:         ○ Synchronized b...
Similarity Measures              Static   CFG           DS                CS   ● Matching DS and CS features:         ○ Nu...
Matching Algorithm   ● Feature vector is composed of features of     mixed data types (categorical and numerical)   ● Two ...
Multi-Stage MatchingDec 5, 2012   MMath Thesis Presentation   41
Multi-Stage MatchingDec 5, 2012   MMath Thesis Presentation   41
Multi-Stage Matching   ● The job profile is composed of independent     map profile and reduce profile   ● Multi-stage mat...
Machine Learning Approach   ● Generalized distance function         ○ Weighted sums of the distances/similarities         ...
Machine Learning Approach   ● Training data set generation:         ○ For every job, Ji, in the profile store, pick its pr...
Machine Learning Approach   ● Machine learning algorithm:         ○ Gradient Boosted Regression Tree (GBRT)         ○ Prof...
Outline   ● Hadoop MapReduce   ● Tuning Hadoop Configuration Parameters         ○ Rule-Based Approach         ○ Feedback-B...
Infrastructure   ● 16 x Amazon EC2 c1.medium nodes:         ○ 2 x Virtual cores         ○ 1.7 GB of RAM         ○ 350 GB o...
BenchmarkDec 5, 2012    MMath Thesis Presentation   48
Evaluation   ● Objectives:        a. Profile matcher accuracy        b. Profile matcher efficiency           ■ The profile...
Profile Matcher Accuracy   ● Two content states of the profile store   ● Same Data (SD) content state:         ○ PStorM co...
Profile Matcher Accuracy   ● Evaluation metric is the number of correct matches as a        fraction of the number of job ...
Profile Matcher Accuracy   ● The accuracy of PStorM will be compared to     the accuracy of the alternative solutions   ● ...
Profile Matcher Accuracy:   Feature Selection   ● Alternative feature selection approaches:         ○ P-features:         ...
Profile Matcher Accuracy:   Feature SelectionDec 5, 2012     MMath Thesis Presentation   54
Profile Matcher Accuracy:   Matching Algorithm   ● PStorM uses the multi-stage matching     algorithm   ● The alternative ...
Profile Matcher Accuracy:   Matching AlgorithmDec 5, 2012     MMath Thesis Presentation   56
Profile Matcher Efficiency   ● Runtime speedups is the main factor that     matters   ● A third content state, NJ:        ...
Profile Matcher Efficiency   Default    12      824                 100   302Dec 5, 2012        MMath Thesis Presentation ...
Conclusion   ● Hadoop configuration parameters and their     effect on the performance of MR jobs   ● Robustness and effic...
Upcoming SlideShare
Loading in …5
×

PStorM

822 views
664 views

Published on

PStorM: Profile Storage and Matching for Feedback-Based Tuning of MapReduce Jobs

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
822
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

PStorM

  1. 1. PStorM: Profile Storage and Matching forFeedback-Based Tuning of MapReduce Jobs MMath Thesis Presentation by Mostafa Ead Supervised by Prof. Ashraf Aboulnaga
  2. 2. Outline ● Hadoop MapReduce ● Tuning Hadoop Configuration Parameters ○ Rule-Based Approach ○ Feedback-Based Approach ● PStorM System Overview ● The Profile Matcher ○ Feature Selection ○ Similarity Measures ○ Matching Algorithm ● EvaluationDec 5, 2012 MMath Thesis Presentation 2
  3. 3. The MapReduce Programming Model P11 Input Split-1 Map-1 P12 P11 Output P21 Red-1 Split-1 P21 P31 Input Split-2 Map-2 P22 P12 Output P22 Red-2 Split-2 P31 P32 Input Split-3 Map-3 P32<K1, V1> <K2, V2> <K2, list(V2)> <K3, V3>Dec 5, 2012 MMath Thesis Presentation 3
  4. 4. Hadoop MapReduce ● Hadoop is a Java open-source implementation of the MapReduce model ● Hadoop configuration parameters ○ io.sort.mb = 100 ○ mapred.compress.map.output = false ○ mapred.reduce.tasks = 1 ● These parameters have significant effect on the performance of the MR jobDec 5, 2012 MMath Thesis Presentation 4
  5. 5. Hadoop Configuration Parameters P11 Input Split-1 Map-1 P12 Serialize Map-1 Memory Buffer Partition Sort, [Combine], P11 Input [Compress] P12 Split-1 Read off Merge HDFS Map Collect Spill SpillsDec 5, 2012 MMath Thesis Presentation 5
  6. 6. Hadoop Configuration Parameters P11 Input Split-1 Map-1 io.sort.mb P12 Serialize Map-1 Memory Buffer Partition Sort, [Combine], P11 Input [Compress] P12 Split-1 Read off Merge HDFS Map Collect Spill SpillsDec 5, 2012 MMath Thesis Presentation 5
  7. 7. Hadoop Configuration Parameters P11 Input Split-1 Map-1 io.sort.mb P12 Serialize Map-1 Memory Buffer Partition mapred.compress. map.output Sort, [Combine], P11 Input [Compress] P12 Split-1 Read off Merge HDFS Map Collect Spill SpillsDec 5, 2012 MMath Thesis Presentation 5
  8. 8. Hadoop Configuration Parameters ● Good setting of these parameters relies on: ○ Behaviour of the map and reduce functions ○ Cluster resources ● Cross-interaction between the configuration parameters: ○ io.sort.record.percent and io.sort.mb Meta-Data Serialized Intermediate RecordsDec 5, 2012 MMath Thesis Presentation 6
  9. 9. Rule-Based Optimizer ● Initial attempt is to capture the hadoop admin. expertise into a set of <rule, action> pairs ○ Intermediate > input data size => enable compression ○ Reduce function is associative-commutative => enable the combiner ● This attempt achieved good runtime speedups, but not for all MR jobsDec 5, 2012 MMath Thesis Presentation 7
  10. 10. Rule-Based Optimizer (RBO)Dec 5, 2012 MMath Thesis Presentation 8
  11. 11. Feedback-Based Tuning Approach ● Another attempt is to capture the effect of the program complexity and the cluster resources on the performance of the job into an execution profile ● The profile is feedback to an optimizer to provide cost-based recommendations ● This attempt achieved better runtime speedupsDec 5, 2012 MMath Thesis Presentation 9
  12. 12. Feedback-Based Tuning ApproachDec 5, 2012 MMath Thesis Presentation 10
  13. 13. Starfish ● Starfish is an automatic feedback-based tuning system First Submission Subsequent SubmissionsDec 5, 2012 MMath Thesis Presentation 11
  14. 14. Starfish ● Starfish execution profile: ○ General: IO, CPU, Memory ○ Domain specific: runtimes of every phase in the map/reduce tasks ● Tuning workflow: ○ Apply dynamic instrumentation code to the job ○ Run the instrumented job with the given parameter settings and collect the execution profile ○ For the next submission of the same job, make the tuning decisions based on its execution profile ○ Run the job with the tuned parameter settingsDec 5, 2012 MMath Thesis Presentation 12
  15. 15. Starfish ● Starfish execution profile: ○ General: IO, CPU, Memory ○ Domain Specific: runtimes of every phase in the map/reduce tasks ● Tuning workflow: Profile Collection Overhead ○ Apply dynamic instrumentation code to the job ○ Run the instrumented job with the default parameter 37% for the WCoP settings and collect the execution profile ○ For the next submission of the same job, make the tuning decisions based on its execution profile ○ Run the job with the tuned parameter settingsDec 5, 2012 MMath Thesis Presentation 12
  16. 16. Starfish ● Starfish execution profile: ○ General: IO, CPU, Memory ○ Domain Specific: runtimes of every phase in the map/reduce tasks ● Tuning workflow: Profile Collection Overhead ○ Apply dynamic instrumentation code to the job ○ Run the instrumented job with the default parameter 37% for the WCoP settings and collect the execution profile ○ For the next submission of the same job, make the No Profile Reuse tuning decisions based on its execution profile ○ Run the job with the tuned parameter settingsDec 5, 2012 MMath Thesis Presentation 12
  17. 17. Profile Reuse ● MR jobs have a high likelihood to be similar: ○ MR jobs are generated from a high level language e.g. PigLatin and HiveQL ○ Code reuse and refactoring ● Execution profile composition for new jobs: J1: map-profile reduce-profile J2: map-profile reduce-profile J3: Map function similar to J1, and reduce function similar to J2Dec 5, 2012 MMath Thesis Presentation 13
  18. 18. Profile Reuse ● MR jobs have a high likelihood to be similar: ○ MR jobs are generated from a high level query language e.g. PigLatin and HiveQL ○ Code reuse and refactoring ● Execution profile composition for new jobs: J1: map-profile reduce-profile J2: map-profile reduce-profile J3: map-profile reduce-profileDec 5, 2012 MMath Thesis Presentation 13
  19. 19. Profile Reuse Example ● Bigram Relative Frequency MR job: ○ Counts the frequency of a pair of subsequent words relative to the frequency of the first word in that pair ● Word Co-occurrence MR job: ○ Counts the co-occurrences of every pair of words in a sliding window of length n ● At n=2: ○ Similar behaviour ○ Similar execution profilesDec 5, 2012 MMath Thesis Presentation 14
  20. 20. Profile Reuse ExampleDec 5, 2012 MMath Thesis Presentation 15
  21. 21. Challenge Given a repository of execution profiles of previously executed MR jobs, how to automatically compose an execution profile that can be useful for tuning the configuration parameters of a newly submitted job ?Dec 5, 2012 MMath Thesis Presentation 16
  22. 22. Outline ● Hadoop MapReduce ● Tuning Hadoop Configuration Parameters ○ Rule-Based Approach ○ Feedback-Based Approach ● PStorM System Overview ● The Profile Matcher ○ Feature Selection ○ Similarity Measures ○ Matching Algorithm ● EvaluationDec 5, 2012 MMath Thesis Presentation 17
  23. 23. PStorM: Profile Store and Matcher ● PStorM goals: ○ Extensible profile store ○ Accurate profile matcher that reuses the stored execution profiles to compose a matching profile for the submitted job, even for unseen jobs ○ The performance gains achieved by the feedback- based tuning system given the complete profile of the job should be equal to the gains achieved given the profile returned by PStorMDec 5, 2012 MMath Thesis Presentation 18
  24. 24. System OverviewDec 5, 2012 MMath Thesis Presentation 19
  25. 25. Profile Matcher ● Profile matching is a domain-specific pattern recognition problem: a. Feature selection b. Similarity measures c. Matching algorithmDec 5, 2012 MMath Thesis Presentation 20
  26. 26. Profile MatcherDec 5, 2012 MMath Thesis Presentation 21
  27. 27. Sample Profile ● Dataflow fields (D): ○ Number of input records to the map/reduce tasks ● Cost fields (C): ○ Map/reduce phase times in the map/reduce tasks ● Dataflow statistics (DS): ○ Selectivity of the map/reduce functions in terms of size and number of records ● Cost statistics (CS): ○ CPU cost to process one input/intermediate record in the map/reduce tasksDec 5, 2012 MMath Thesis Presentation 22
  28. 28. Feature Selection Job D C DS CS ● Q: Given a MapReduce job and its sample profile, what are the features that can distinguish the candidate matching profile among other profiles stored in the Profile Store ? ● Analytical models of the What-If engineDec 5, 2012 MMath Thesis Presentation 23
  29. 29. Feature Selection First Submission Subsequent SubmissionsDec 5, 2012 MMath Thesis Presentation 24
  30. 30. Feature Selection Job D C DS CS ● Inputs to the analytical models: ○ Dataflow statistics ○ Cost statistics ○ Configuration parameter settings ■ Enumerated by the cost-based optimizer ● No need to find a matching profile whose D and C fields are similar to the complete profile of the submitted jobDec 5, 2012 MMath Thesis Presentation 25
  31. 31. Feature Selection Job DS CS ● The DS and CS features are obtained from the sample profile ● The selected features should be expected to have the same values among different samples of the same job, and different values among the profiles of other jobsDec 5, 2012 MMath Thesis Presentation 26
  32. 32. Feature Selection Job DS CS ● Dataflow statistics are expected to have this characteristic ● Map selectivity of the number of records: ○ Sort: = 1 ○ Word Count: > 1 ○ Word Co-occurrence Pairs: >>1Dec 5, 2012 MMath Thesis Presentation 27
  33. 33. Feature Selection Job DS CS ● CS features can vary between different samples of the same job ● Map CPU cost can differ for the same job between the executions of the sample on over-utilized and under-utilized nodesDec 5, 2012 MMath Thesis Presentation 28
  34. 34. Feature Selection Job DS CS ● What are the features that can be extracted from the bytecode of the submitted job, and can be useful for the matcher ?Dec 5, 2012 MMath Thesis Presentation 29
  35. 35. Feature Selection Job DS CS ● Differences between MR jobs are Input Formatter Intermediate Key Type Input Key Type Mapper Intermediate Value Type Input Value Type Output Formatter Intermediate Key Type Reducer Output Key TypeIntermediate Value Type Output Value TypeDec 5, 2012 MMath Thesis Presentation 30
  36. 36. Feature Selection Job DS CS ● We will refer to these features as the static features ● Different input formatter results in different IO cost to read the input recordsDec 5, 2012 MMath Thesis Presentation 31
  37. 37. Feature Selection Job DS CS ● So far, the map/reduce functions are analyzed as black-boxes ● Static analysis of the bytecode of the map/reduce functions: ○ Control Flow Graphs (CFG) ○ Different map/reduce CFG results in different map/reduce CPU costsDec 5, 2012 MMath Thesis Presentation 32
  38. 38. CFG Example Word Co-occurrence Pairs Word CountDec 5, 2012 MMath Thesis Presentation 33
  39. 39. CFG Example Word Co-occurrence Pairs Word CountDec 5, 2012 MMath Thesis Presentation 34
  40. 40. CFG Example Word Co-occurrence Pairs Word Count Different map CFGs => different map-phase timesDec 5, 2012 MMath Thesis Presentation 35
  41. 41. Outline ● Hadoop MapReduce ● Tuning Hadoop Configuration Parameters ○ Rule-Based Approach ○ Feedback-Based Approach ● PStorM System Overview ● The Profile Matcher ○ Feature Selection ○ Similarity Measures ○ Matching Algorithm ● EvaluationDec 5, 2012 MMath Thesis Presentation 36
  42. 42. Similarity Measures Static CFG DS CS ● Matching the static features: ○ Feature values are all strings (categorical data) ○ Jaccard Similarity index ○ Score range: [0, 1]Dec 5, 2012 MMath Thesis Presentation 37
  43. 43. Similarity Measures Static CFG DS CS ● Matching CFGs: ○ Synchronized breadth-first search ■ Both normal statements ■ Both branch statements ● Condition of a loop ○ Score range: {0, 1} ■ Conservative matching scoreDec 5, 2012 MMath Thesis Presentation 38
  44. 44. Similarity Measures Static CFG DS CS ● Matching DS and CS features: ○ Numerical features ○ Data normalization to bring all features to the same scale ○ Euclidean distance ○ Score range: [0, ]Dec 5, 2012 MMath Thesis Presentation 39
  45. 45. Matching Algorithm ● Feature vector is composed of features of mixed data types (categorical and numerical) ● Two possible matching algorithms: ○ Multi-stage matching ○ Machine learning approachDec 5, 2012 MMath Thesis Presentation 40
  46. 46. Multi-Stage MatchingDec 5, 2012 MMath Thesis Presentation 41
  47. 47. Multi-Stage MatchingDec 5, 2012 MMath Thesis Presentation 41
  48. 48. Multi-Stage Matching ● The job profile is composed of independent map profile and reduce profile ● Multi-stage matcher will be applied twice ● The matching map profile and reduce profile will compose the final matching job profileDec 5, 2012 MMath Thesis Presentation 42
  49. 49. Machine Learning Approach ● Generalized distance function ○ Weighted sums of the distances/similarities calculated separately for each set of features of the same type ○ Weights should be learnedDec 5, 2012 MMath Thesis Presentation 43
  50. 50. Machine Learning Approach ● Training data set generation: ○ For every job, Ji, in the profile store, pick its profile, Pi ○ Choose a random profile, Pj, from the profile store ○ Calculate the distances and similarities between Pi and Pj ○ Calculate T1: predicted runtime of the job Ji given the profile Pi ○ Calculate T2: predicted runtime of the job Ji given the profile Pj ○ D = |T1 - T2|Dec 5, 2012 MMath Thesis Presentation 44
  51. 51. Machine Learning Approach ● Machine learning algorithm: ○ Gradient Boosted Regression Tree (GBRT) ○ Profile matching implementation in R ● Profile matching using the learned model: ○ Extract the profile, Ps, for the submitted MR job ○ Calculate the similarities/distances between Ps and the profiles in PStorM, and the corresponding value of D ○ Select the PStorM profile whose D is the minimum ● PStorM uses multi-stage matching algorithmDec 5, 2012 MMath Thesis Presentation 45
  52. 52. Outline ● Hadoop MapReduce ● Tuning Hadoop Configuration Parameters ○ Rule-Based Approach ○ Feedback-Based Approach ● PStorM System Overview ● The Profile Matcher ○ Feature Selection ○ Similarity Measures ○ Matching Algorithm ● EvaluationDec 5, 2012 MMath Thesis Presentation 46
  53. 53. Infrastructure ● 16 x Amazon EC2 c1.medium nodes: ○ 2 x Virtual cores ○ 1.7 GB of RAM ○ 350 GB of instance storage ● Hadoop cluster: ○ 1 master + 15 workers ○ Each worker can run at most 2 map and 2 reduce tasks concurrently ● PStorM profile store: ○ HBase instance running on the master nodeDec 5, 2012 MMath Thesis Presentation 47
  54. 54. BenchmarkDec 5, 2012 MMath Thesis Presentation 48
  55. 55. Evaluation ● Objectives: a. Profile matcher accuracy b. Profile matcher efficiency ■ The profile returned from PStorM should result in comparable speedups to that achieved given the complete profile of the submitted jobDec 5, 2012 MMath Thesis Presentation 49
  56. 56. Profile Matcher Accuracy ● Two content states of the profile store ● Same Data (SD) content state: ○ PStorM contains the profile collected during the execution on the same submitted data set ● Different Data (DD) content state: ○ PStorM contains the profile collected during the execution on a different data setDec 5, 2012 MMath Thesis Presentation 50
  57. 57. Profile Matcher Accuracy ● Evaluation metric is the number of correct matches as a fraction of the number of job submissions ● At the SD content state: ○ A correct match is the profile of the submitted job collected during the execution on the same data set ● At the DD content state: ○ A correct match is the profile of the submitted job collected during the execution on another data set ● Number of correct matches is calculated for the map and reduce profiles, separatelyDec 5, 2012 MMath Thesis Presentation 51
  58. 58. Profile Matcher Accuracy ● The accuracy of PStorM will be compared to the accuracy of the alternative solutions ● PStorM contributions at the matching level: ○ Feature selection: ■ New set of features: static and CFG ■ Feature selection based on our domain knowledge ○ Multi-stage matching algorithmDec 5, 2012 MMath Thesis Presentation 52
  59. 59. Profile Matcher Accuracy: Feature Selection ● Alternative feature selection approaches: ○ P-features: ■ Given the sample profile of the submitted job ○ SP-features: ■ Given the static features we proposed and the sample profile of the submitted job ● For both approaches: ○ Rank the features according to their information gains ○ Select the highest F features, such that F = number of features used by PStorMDec 5, 2012 MMath Thesis Presentation 53
  60. 60. Profile Matcher Accuracy: Feature SelectionDec 5, 2012 MMath Thesis Presentation 54
  61. 61. Profile Matcher Accuracy: Matching Algorithm ● PStorM uses the multi-stage matching algorithm ● The alternative one is the machine learning approach: ○ GBRT has multiple configuration parameters ○ Four trials of different parameter settings until we found the one that resulted in the highest matching accuracy for GBRTDec 5, 2012 MMath Thesis Presentation 55
  62. 62. Profile Matcher Accuracy: Matching AlgorithmDec 5, 2012 MMath Thesis Presentation 56
  63. 63. Profile Matcher Efficiency ● Runtime speedups is the main factor that matters ● A third content state, NJ: ○ The submitted job has not been executed before on the cluster ○ Highlights the benefits of profile reuseDec 5, 2012 MMath Thesis Presentation 57
  64. 64. Profile Matcher Efficiency Default 12 824 100 302Dec 5, 2012 MMath Thesis Presentation 58
  65. 65. Conclusion ● Hadoop configuration parameters and their effect on the performance of MR jobs ● Robustness and efficiency of the feedback- based tuning approach ● Drawbacks: overhead and no profile reuse ● PStorM: profile storage and matcher that leverages the idea of profile reuse ● PStorM resulted in significant speedups even for new jobsDec 5, 2012 MMath Thesis Presentation 59

×