SlideShare a Scribd company logo
1 of 28
Using Derivation-Free Optimization
in the Hadoop Cluster
with Terasort
Renato dos Santos Alves & Sarosh Farjam
Projeto de Experimentos ~ 03.7.2014
Sequence
• Abstract
• Introduction
• Workload Analysis of Search Engines
• Benchmarking Methodology and Decisions
• Scaleable Data Generation Tool
• Case Studies
• Conclusions
Introduction
• Implementation of the MapReduce cluster Benckmark
TeraSort by DFO method
• Every interacting DFO method presents new values ​​for
parameter configuration of Hadoop.
• For these parameters, specified within the framework we
need to use a tool that assists in this cluster configuration to
ensure proper implementation of TeraSort application.
• Chef server and Chef client
TeraSort Benchmark
Terasort includes 3 MapReduce
applications:
● Teragen: generates the data.
● Terasort: samples the input data
and uses them with MapReduce to
sort the data.
● Teravalidate: validates the output is
sorted
DFO Method
• Derivative free optimization is a subject
of mathematical optimization.
• It refers to problems for which derivative information is
unavailable or
• methods that do not use derivatives.
• The derivative of a function of a real variable measures the
sensitivity to change of a quantity (dependent variable) which is
determined by another quantity (independent variable). E.g. the
derivative of the position of a moving object with respect to time is
the object's velocity.
Algorithm BOBYQA
• BOBYQA (Bound Optimization BY Quadratic Approximation) is
a numerical optimization algorithm by Michael J. D. Powell.
• Name of Powell's Fortran 77 implementation of the algorithm.
• BOBYQA solves bound constrained optimization problems without
using derivatives of the objective function, which makes it
a derivative-free algorithm.
• The algorithm solves the problem using a trust region method that
forms quadratic models by interpolation. One new point is
computed on each iteration, usually by solving a trust region sub
problem, subject to the bound constraints.
Algorithm COBYLA
• Constrained optimization by linear approximation
(COBYLA) is a numerical optimization method
for constrained problems where the derivative of the
objective function is not known,
• invented by Michael J. D. Powell.
• Powell invented COBYLA while working for Westland
Helicopters.
• COBYLA proceeds by iteratively approximating the
actual objective function with linear programs.
Hadoop Environment
• A physical cluster with 29 nodes was used,
• A master Hadoop server (responsible for
implementing the JobTracker and NameNode services)
• 28 Hadoop Slaves (dedicated to the implementation of
TaskTracker and DataNode services).
• 2 Gigabit Ethernet to perform the connectivity
between the 29 nodes
Hadoop Environment
• A front-end access to the cluster server, that
server is configured as a Chef Server also used
to organize the executions of DFO TeraSort
application is then characterized the
synchronization functions of the DFO plays
and updating parameter settings Hadoop
based on each iteration of DFO TeraSort
method.
Experiment Execution
• Nemesis a server that is not part of the cluster is used as a front end for the
implementation TeraSort application, running the DFO method and updating
settings Hadoop based on their output.
• The synchronization of executions TeraSort updates and Hadoop with the output
of DFO method is performed by dfo_hadoop_terasort application executed on the
front-end server.
• The implementation of dfo_hadoop_terasort application is supplied with a file that
contains the initial values ​​of the configuration parameters of Hadoop, restrictions
so that these values ​​do not reach unwanted data out value for the objective
function, tolerance value for the restrictions and maximum amount of interactions.
With the processing of the input file and the interaction with the Hadoop cluster is
discovered which parameter values ​​cause a greater impact for faster execution of
TeraSort application, taking as output a file with the best configuration parameters
of that.
Experiment Execution
• As the cluster was composed of 28 servidoers slaves and each
server with two processors, for a total of 56 slots available
processing was decided to maintain 10% of this total, available for
tasks due to failures in implementation were spaced more than
once. Therefore, we used about 100 Gigabyte generated by Hadoop
Teragen.
• To confront the optimization of the execution time of Jobs, was
executed two DFO BOBYCA And COBYLA method, aiming to identify
which method best suits the application TeraSort forcenida by
Hadoop ....
• Two runs with both algorithms and 50 iterations to identify at what
time the executions were carried out can converge to a better
runtime.
Switching COBYLA, BOBYQA
• /* algoritmo COBYLA,Constrained Optimization
BY Linear Approximations */
• opt = nlopt_create(NLOPT_LN_COBYLA, N);
• //opt = nlopt_create(NLOPT_LN_BOBYQA, N);
• nlopt_set_lower_bounds(opt, lb);
• nlopt_set_upper_bounds(opt, ub);
• nlopt_set_max_objective(opt, objetivo,
NULL);
Experiment Execution
Commands
• First - Generate Tera sort
• *Teragen will generate approximately 100 GB
100 000 179 688 bytes
• $ hadoop jar $HADOOP_HOME/hadoop-*examples*.jar
teragen <number of 100-byte rows> <output dir>
• $ hadoop jar hadoop-examples-1.0.4.jar teragen
1000000000 terasort-input
Commands
• Second
• [hadoop@nemesis otimizacao]$ nohup sudo
time ./dfo_hadoop_terasort < entrada >
log_execucao_terasort &
Results
• We used of DFO method with BOBYQA and
COBYLA algorithms
• Presented the main difference in variation of
execution time of each iteration Jobs with
dfo_hadoop_terasort application,
• it is characterized mainly, how they treat
approximations of the points for the object
function, the quadratic or linear form
respectively.
TeraSort 50 a
Iterações Tempo F
1 1491
2 1501
3 1447
4 1889
5 2076
6 1466
7 1470
8 1319
9 1897
10 1611
11 1440
12 1588
13 1321
14 1897
15 1289
16 1704
17 1294
18 1313
19 1728
20 1971
21 1842
1000
1200
1400
1600
1800
2000
2200
2400
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
timeinseconds
number of iterations
TeraSort 50 A using BOBYQA
execution progress
1289
TeraSort 50 c
Iterações Tempo F
1 1587
2 1721
3 1473
4 1669
5 1801
6 1833
7 1486
8 1709
9 1510
10 1962
11 1988
12 1934
13 1898
14 2277
15 1933
16 1516
17 1601
18 1561
19 1639
20 1515
21 1507
22 2205
23 1838
24 2419
25 1744
26 1566
27 1619
28 1890
29 1988
30 1875
31 1620
32 1780
33 1607
34 1536
35 1621
36 1580
37 1626
38 1675
39 2065
1000
1200
1400
1600
1800
2000
2200
2400
2600
2800
3000
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38
timeinseconds
number of iterations
TeraSort 50 C using BOBYQA execution progress
1473
TeraSort 50 b
Iterações Tempo F
1 1437
2 1343
3 1335
4 1228
5 1213
6 1240
7 1198
8 1203
9 1231
10 1178
11 1174
12 1187
13 1186
14 1204
15 1128
16 1150
17 1190
18 1165
19 1190
20 1208
21 1204
22 1113
23 1171
24 1185
25 1190
26 1170
27 1155
28 1211
29 1159
30 1198
31 1206
32 1144
33 1177
34 1179
35 1232
36 1157
37 1201
38 1150
39 1195
40 1178
41 1237
42 1196
43 1233
44 1356
45 1400
46 1674
47 1424
48 1365
49 1366
50 1320
1000
1100
1200
1300
1400
1500
1600
1700
1800
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
timeinseconds
number of iterations
TeraSort 50 B using COBYLA execution progress
1113
TeraSort 50 new
Iterações Tempo F
1 1442
2 1298
3 1285
4 1285
5 1274
6 1329
7 1343
8 1314
9 1289
10 1308
11 1304
12 1322
13 1345
14 1421
15 1369
16 1336
17 1348
18 1335
19 1333
20 1307
21 1367
22 1369
23 1352
24 1362
25 1390
26 1350
27 1324
28 1382
29 1347
30 1339
1000
1100
1200
1300
1400
1500
1600
1700
1800
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
timeinseconds
number of iterations
TeraSort 50 New using COBYLA execution progress
1274
1000
1100
1200
1300
1400
1500
1600
1700
1800
1 2 3 4 5 6 7 8 9 101112131415161718192021222324252627282930
number of iterations
TeraSort 50 New using COBYLA
execution progress
1000
1100
1200
1300
1400
1500
1600
1700
1800
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49
number of iterations
TeraSort 50 B using COBYLA
execution progress
1000
1200
1400
1600
1800
2000
2200
2400
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
number of iterations
TeraSort 50 A using BOBYQA
execution progress
1000
1500
2000
2500
3000
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37
number of iterations
TeraSort 50 C using BOBYQA
execution progress
The use of DFO method with BOBYQA and COBYLA algorithms and presents as main difference the variation of
execution time of each iteration Jobs dfo_hadoop_terasort application, it is mainly how they are treated
approximations of the points for the object function the quadratic or linear form respectively.
1000
1200
1400
1600
1800
2000
2200
2400
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49
timeinseconds
Difference between Algorithms COBYLA and BOBYQA
TeraSort 50 New/COBYLA TeraSort 50 B/COBYLA TeraSort 50 A/BOBYQA TeraSort 50 C/BOBYQA
Conclusion
• The convergence of the total time proves to be more
stable in COBYLA and without many fluctuations when
compared to BOBYQA algorithm.
• The Speedup BOBYQA algorithm in the execution of
TeraSort application is 12% on average
• And the results reported by COBYLA algorithm, in the
execution of TeraSort application demonstrates
Speedup on average 21.15% over the initial settings of
Hadoop and a greater optimization than the BOBYQA
algorithm.
References
• [1] O'Malley, O. (2008, May). TeraByte Sort on Apache Hadoop.
Retrieved from http://sortbenchmark.org/YahooHadoop.pdf
• [2] Anand, A. (2009, May). Hadoop Sorts a Petabyte in 16:25 Hours
and a Terabyte in 62 Seconds. Retrieved from
https://developer.yahoo.com/blogs/hadoop/hadoop-sorts-
petabyte-16-25-hours-terabyte-62-422.html
• [3] Gray, J. (n.d.). Sort Benchmark Home Page. Retrieved from
http://sortbenchmark.org/
• [4] A Measure ofTransaction Processing Power. (1985) Datamation,
31 (7), 112-118.
• [5] Wikipedia; http://en.wikipedia.org/

More Related Content

What's hot

Analysis of local affine model v2
Analysis of  local affine model v2Analysis of  local affine model v2
Analysis of local affine model v2cindy071434
 
Analysis of local affine model v2
Analysis of  local affine model v2Analysis of  local affine model v2
Analysis of local affine model v2cindy071434
 
A new Post-Processing Pipeline
A new Post-Processing PipelineA new Post-Processing Pipeline
A new Post-Processing PipelineWolfgang Engel
 
Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16
Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16
Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16MLconf
 
IEEE P2P 2013 - Bootstrapping Skynet: Calibration and Autonomic Self-Control ...
IEEE P2P 2013 - Bootstrapping Skynet: Calibration and Autonomic Self-Control ...IEEE P2P 2013 - Bootstrapping Skynet: Calibration and Autonomic Self-Control ...
IEEE P2P 2013 - Bootstrapping Skynet: Calibration and Autonomic Self-Control ...Kalman Graffi
 
A Highly Parallel Semi-Dataflow FPGA Architecture for Large-Scale N-Body Simu...
A Highly Parallel Semi-Dataflow FPGA Architecture for Large-Scale N-Body Simu...A Highly Parallel Semi-Dataflow FPGA Architecture for Large-Scale N-Body Simu...
A Highly Parallel Semi-Dataflow FPGA Architecture for Large-Scale N-Body Simu...NECST Lab @ Politecnico di Milano
 
Thesis Presentation
Thesis PresentationThesis Presentation
Thesis PresentationSpondon Saha
 
Hadoop combiner and partitioner
Hadoop combiner and partitionerHadoop combiner and partitioner
Hadoop combiner and partitionerSubhas Kumar Ghosh
 
Implementation of linear regression and logistic regression on Spark
Implementation of linear regression and logistic regression on SparkImplementation of linear regression and logistic regression on Spark
Implementation of linear regression and logistic regression on SparkDalei Li
 
PeNeLoop: Parallelizing Federated SPARQL queries in presence of replicated fr...
PeNeLoop: Parallelizing Federated SPARQL queries in presence of replicated fr...PeNeLoop: Parallelizing Federated SPARQL queries in presence of replicated fr...
PeNeLoop: Parallelizing Federated SPARQL queries in presence of replicated fr...Thomas Minier
 
CS 542 -- Query Execution
CS 542 -- Query ExecutionCS 542 -- Query Execution
CS 542 -- Query ExecutionJ Singh
 
Control System toolbox in Matlab
Control System toolbox in MatlabControl System toolbox in Matlab
Control System toolbox in MatlabAbdul Sami
 
Understand and Harness the Capabilities of Intel® Xeon Phi™ Processors
Understand and Harness the Capabilities of Intel® Xeon Phi™ ProcessorsUnderstand and Harness the Capabilities of Intel® Xeon Phi™ Processors
Understand and Harness the Capabilities of Intel® Xeon Phi™ ProcessorsIntel® Software
 
Hadoop map reduce in operation
Hadoop map reduce in operationHadoop map reduce in operation
Hadoop map reduce in operationSubhas Kumar Ghosh
 
Subgraph Matching for Resource Allocation in the Federated Cloud Environment
Subgraph Matching for Resource Allocation in the Federated Cloud EnvironmentSubgraph Matching for Resource Allocation in the Federated Cloud Environment
Subgraph Matching for Resource Allocation in the Federated Cloud EnvironmentAtakanAral
 
Wave File Features Extraction using Reduced LBP
Wave File Features Extraction using Reduced LBP Wave File Features Extraction using Reduced LBP
Wave File Features Extraction using Reduced LBP IJECEIAES
 
Tech Days 2015: User Presentation Vermont Technical College
Tech Days 2015: User Presentation Vermont Technical CollegeTech Days 2015: User Presentation Vermont Technical College
Tech Days 2015: User Presentation Vermont Technical CollegeAdaCore
 

What's hot (20)

Run time
Run timeRun time
Run time
 
Analysis of local affine model v2
Analysis of  local affine model v2Analysis of  local affine model v2
Analysis of local affine model v2
 
Analysis of local affine model v2
Analysis of  local affine model v2Analysis of  local affine model v2
Analysis of local affine model v2
 
A new Post-Processing Pipeline
A new Post-Processing PipelineA new Post-Processing Pipeline
A new Post-Processing Pipeline
 
Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16
Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16
Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16
 
IEEE P2P 2013 - Bootstrapping Skynet: Calibration and Autonomic Self-Control ...
IEEE P2P 2013 - Bootstrapping Skynet: Calibration and Autonomic Self-Control ...IEEE P2P 2013 - Bootstrapping Skynet: Calibration and Autonomic Self-Control ...
IEEE P2P 2013 - Bootstrapping Skynet: Calibration and Autonomic Self-Control ...
 
A Highly Parallel Semi-Dataflow FPGA Architecture for Large-Scale N-Body Simu...
A Highly Parallel Semi-Dataflow FPGA Architecture for Large-Scale N-Body Simu...A Highly Parallel Semi-Dataflow FPGA Architecture for Large-Scale N-Body Simu...
A Highly Parallel Semi-Dataflow FPGA Architecture for Large-Scale N-Body Simu...
 
Thesis Presentation
Thesis PresentationThesis Presentation
Thesis Presentation
 
Hadoop combiner and partitioner
Hadoop combiner and partitionerHadoop combiner and partitioner
Hadoop combiner and partitioner
 
Implementation of linear regression and logistic regression on Spark
Implementation of linear regression and logistic regression on SparkImplementation of linear regression and logistic regression on Spark
Implementation of linear regression and logistic regression on Spark
 
PeNeLoop: Parallelizing Federated SPARQL queries in presence of replicated fr...
PeNeLoop: Parallelizing Federated SPARQL queries in presence of replicated fr...PeNeLoop: Parallelizing Federated SPARQL queries in presence of replicated fr...
PeNeLoop: Parallelizing Federated SPARQL queries in presence of replicated fr...
 
CS 542 -- Query Execution
CS 542 -- Query ExecutionCS 542 -- Query Execution
CS 542 -- Query Execution
 
D0341015020
D0341015020D0341015020
D0341015020
 
Control System toolbox in Matlab
Control System toolbox in MatlabControl System toolbox in Matlab
Control System toolbox in Matlab
 
BIRTE-13-Kawashima
BIRTE-13-KawashimaBIRTE-13-Kawashima
BIRTE-13-Kawashima
 
Understand and Harness the Capabilities of Intel® Xeon Phi™ Processors
Understand and Harness the Capabilities of Intel® Xeon Phi™ ProcessorsUnderstand and Harness the Capabilities of Intel® Xeon Phi™ Processors
Understand and Harness the Capabilities of Intel® Xeon Phi™ Processors
 
Hadoop map reduce in operation
Hadoop map reduce in operationHadoop map reduce in operation
Hadoop map reduce in operation
 
Subgraph Matching for Resource Allocation in the Federated Cloud Environment
Subgraph Matching for Resource Allocation in the Federated Cloud EnvironmentSubgraph Matching for Resource Allocation in the Federated Cloud Environment
Subgraph Matching for Resource Allocation in the Federated Cloud Environment
 
Wave File Features Extraction using Reduced LBP
Wave File Features Extraction using Reduced LBP Wave File Features Extraction using Reduced LBP
Wave File Features Extraction using Reduced LBP
 
Tech Days 2015: User Presentation Vermont Technical College
Tech Days 2015: User Presentation Vermont Technical CollegeTech Days 2015: User Presentation Vermont Technical College
Tech Days 2015: User Presentation Vermont Technical College
 

Similar to DFO Optimization of Hadoop Cluster Configuration for Terasort Benchmark

Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...MLconf
 
An Introduction to TensorFlow architecture
An Introduction to TensorFlow architectureAn Introduction to TensorFlow architecture
An Introduction to TensorFlow architectureMani Goswami
 
A TPC Benchmark of Hive LLAP and Comparison with Presto
A TPC Benchmark of Hive LLAP and Comparison with PrestoA TPC Benchmark of Hive LLAP and Comparison with Presto
A TPC Benchmark of Hive LLAP and Comparison with PrestoYu Liu
 
Performance measures
Performance measuresPerformance measures
Performance measuresDivya Tiwari
 
PRETZEL: Opening the Black Box of Machine Learning Prediction Serving Systems
PRETZEL: Opening the Black Box of Machine Learning Prediction Serving SystemsPRETZEL: Opening the Black Box of Machine Learning Prediction Serving Systems
PRETZEL: Opening the Black Box of Machine Learning Prediction Serving SystemsNECST Lab @ Politecnico di Milano
 
Low power correlation for IEEE 802.16 OFDM synchronisation using FPGA
Low power correlation for IEEE 802.16 OFDM  synchronisation using FPGA Low power correlation for IEEE 802.16 OFDM  synchronisation using FPGA
Low power correlation for IEEE 802.16 OFDM synchronisation using FPGA Brundha Sholaganga
 
Tajo_Meetup_20141120
Tajo_Meetup_20141120Tajo_Meetup_20141120
Tajo_Meetup_20141120Hyoungjun Kim
 
How to build analytics for 100bn logs a month with ClickHouse. By Vadim Tkach...
How to build analytics for 100bn logs a month with ClickHouse. By Vadim Tkach...How to build analytics for 100bn logs a month with ClickHouse. By Vadim Tkach...
How to build analytics for 100bn logs a month with ClickHouse. By Vadim Tkach...Altinity Ltd
 
How to build analytics for 100bn logs a month with ClickHouse. By Vadim Tkach...
How to build analytics for 100bn logs a month with ClickHouse. By Vadim Tkach...How to build analytics for 100bn logs a month with ClickHouse. By Vadim Tkach...
How to build analytics for 100bn logs a month with ClickHouse. By Vadim Tkach...Valery Tkachenko
 
Cooperative Task Execution for Apache Spark
Cooperative Task Execution for Apache SparkCooperative Task Execution for Apache Spark
Cooperative Task Execution for Apache SparkDatabricks
 
Dongwon Kim – A Comparative Performance Evaluation of Flink
Dongwon Kim – A Comparative Performance Evaluation of FlinkDongwon Kim – A Comparative Performance Evaluation of Flink
Dongwon Kim – A Comparative Performance Evaluation of FlinkFlink Forward
 
A Comparative Performance Evaluation of Apache Flink
A Comparative Performance Evaluation of Apache FlinkA Comparative Performance Evaluation of Apache Flink
A Comparative Performance Evaluation of Apache FlinkDongwon Kim
 
Inter Task Communication On Volatile Nodes
Inter Task Communication On Volatile NodesInter Task Communication On Volatile Nodes
Inter Task Communication On Volatile Nodesnagarajan_ka
 
Hardware Description Beyond Register-Transfer Level (RTL) Languages
Hardware Description Beyond Register-Transfer Level (RTL) LanguagesHardware Description Beyond Register-Transfer Level (RTL) Languages
Hardware Description Beyond Register-Transfer Level (RTL) LanguagesLEGATO project
 
NVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflow
NVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflowNVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflow
NVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflowNVIDIA Taiwan
 
PostgreSQL and Benchmarks
PostgreSQL and BenchmarksPostgreSQL and Benchmarks
PostgreSQL and BenchmarksJignesh Shah
 
Pretzel: optimized Machine Learning framework for low-latency and high throug...
Pretzel: optimized Machine Learning framework for low-latency and high throug...Pretzel: optimized Machine Learning framework for low-latency and high throug...
Pretzel: optimized Machine Learning framework for low-latency and high throug...NECST Lab @ Politecnico di Milano
 

Similar to DFO Optimization of Hadoop Cluster Configuration for Terasort Benchmark (20)

Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
 
An Introduction to TensorFlow architecture
An Introduction to TensorFlow architectureAn Introduction to TensorFlow architecture
An Introduction to TensorFlow architecture
 
A TPC Benchmark of Hive LLAP and Comparison with Presto
A TPC Benchmark of Hive LLAP and Comparison with PrestoA TPC Benchmark of Hive LLAP and Comparison with Presto
A TPC Benchmark of Hive LLAP and Comparison with Presto
 
Performance measures
Performance measuresPerformance measures
Performance measures
 
PRETZEL: Opening the Black Box of Machine Learning Prediction Serving Systems
PRETZEL: Opening the Black Box of Machine Learning Prediction Serving SystemsPRETZEL: Opening the Black Box of Machine Learning Prediction Serving Systems
PRETZEL: Opening the Black Box of Machine Learning Prediction Serving Systems
 
Low power correlation for IEEE 802.16 OFDM synchronisation using FPGA
Low power correlation for IEEE 802.16 OFDM  synchronisation using FPGA Low power correlation for IEEE 802.16 OFDM  synchronisation using FPGA
Low power correlation for IEEE 802.16 OFDM synchronisation using FPGA
 
Tajo_Meetup_20141120
Tajo_Meetup_20141120Tajo_Meetup_20141120
Tajo_Meetup_20141120
 
How to build analytics for 100bn logs a month with ClickHouse. By Vadim Tkach...
How to build analytics for 100bn logs a month with ClickHouse. By Vadim Tkach...How to build analytics for 100bn logs a month with ClickHouse. By Vadim Tkach...
How to build analytics for 100bn logs a month with ClickHouse. By Vadim Tkach...
 
How to build analytics for 100bn logs a month with ClickHouse. By Vadim Tkach...
How to build analytics for 100bn logs a month with ClickHouse. By Vadim Tkach...How to build analytics for 100bn logs a month with ClickHouse. By Vadim Tkach...
How to build analytics for 100bn logs a month with ClickHouse. By Vadim Tkach...
 
Cooperative Task Execution for Apache Spark
Cooperative Task Execution for Apache SparkCooperative Task Execution for Apache Spark
Cooperative Task Execution for Apache Spark
 
Dongwon Kim – A Comparative Performance Evaluation of Flink
Dongwon Kim – A Comparative Performance Evaluation of FlinkDongwon Kim – A Comparative Performance Evaluation of Flink
Dongwon Kim – A Comparative Performance Evaluation of Flink
 
A Comparative Performance Evaluation of Apache Flink
A Comparative Performance Evaluation of Apache FlinkA Comparative Performance Evaluation of Apache Flink
A Comparative Performance Evaluation of Apache Flink
 
9CM405.24.ppt
9CM405.24.ppt9CM405.24.ppt
9CM405.24.ppt
 
Inter Task Communication On Volatile Nodes
Inter Task Communication On Volatile NodesInter Task Communication On Volatile Nodes
Inter Task Communication On Volatile Nodes
 
199 Final Report
199 Final Report 199 Final Report
199 Final Report
 
Hardware Description Beyond Register-Transfer Level (RTL) Languages
Hardware Description Beyond Register-Transfer Level (RTL) LanguagesHardware Description Beyond Register-Transfer Level (RTL) Languages
Hardware Description Beyond Register-Transfer Level (RTL) Languages
 
NVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflow
NVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflowNVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflow
NVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflow
 
Dasia 2022
Dasia 2022Dasia 2022
Dasia 2022
 
PostgreSQL and Benchmarks
PostgreSQL and BenchmarksPostgreSQL and Benchmarks
PostgreSQL and Benchmarks
 
Pretzel: optimized Machine Learning framework for low-latency and high throug...
Pretzel: optimized Machine Learning framework for low-latency and high throug...Pretzel: optimized Machine Learning framework for low-latency and high throug...
Pretzel: optimized Machine Learning framework for low-latency and high throug...
 

Recently uploaded

UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingrknatarajan
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSISrknatarajan
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingrakeshbaidya232001
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...ranjana rawat
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxupamatechverse
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).pptssuser5c9d4b1
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
 

Recently uploaded (20)

UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSIS
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
 

DFO Optimization of Hadoop Cluster Configuration for Terasort Benchmark

  • 1. Using Derivation-Free Optimization in the Hadoop Cluster with Terasort Renato dos Santos Alves & Sarosh Farjam Projeto de Experimentos ~ 03.7.2014
  • 2. Sequence • Abstract • Introduction • Workload Analysis of Search Engines • Benchmarking Methodology and Decisions • Scaleable Data Generation Tool • Case Studies • Conclusions
  • 3. Introduction • Implementation of the MapReduce cluster Benckmark TeraSort by DFO method • Every interacting DFO method presents new values ​​for parameter configuration of Hadoop. • For these parameters, specified within the framework we need to use a tool that assists in this cluster configuration to ensure proper implementation of TeraSort application. • Chef server and Chef client
  • 4. TeraSort Benchmark Terasort includes 3 MapReduce applications: ● Teragen: generates the data. ● Terasort: samples the input data and uses them with MapReduce to sort the data. ● Teravalidate: validates the output is sorted
  • 5. DFO Method • Derivative free optimization is a subject of mathematical optimization. • It refers to problems for which derivative information is unavailable or • methods that do not use derivatives. • The derivative of a function of a real variable measures the sensitivity to change of a quantity (dependent variable) which is determined by another quantity (independent variable). E.g. the derivative of the position of a moving object with respect to time is the object's velocity.
  • 6. Algorithm BOBYQA • BOBYQA (Bound Optimization BY Quadratic Approximation) is a numerical optimization algorithm by Michael J. D. Powell. • Name of Powell's Fortran 77 implementation of the algorithm. • BOBYQA solves bound constrained optimization problems without using derivatives of the objective function, which makes it a derivative-free algorithm. • The algorithm solves the problem using a trust region method that forms quadratic models by interpolation. One new point is computed on each iteration, usually by solving a trust region sub problem, subject to the bound constraints.
  • 7. Algorithm COBYLA • Constrained optimization by linear approximation (COBYLA) is a numerical optimization method for constrained problems where the derivative of the objective function is not known, • invented by Michael J. D. Powell. • Powell invented COBYLA while working for Westland Helicopters. • COBYLA proceeds by iteratively approximating the actual objective function with linear programs.
  • 8. Hadoop Environment • A physical cluster with 29 nodes was used, • A master Hadoop server (responsible for implementing the JobTracker and NameNode services) • 28 Hadoop Slaves (dedicated to the implementation of TaskTracker and DataNode services). • 2 Gigabit Ethernet to perform the connectivity between the 29 nodes
  • 9. Hadoop Environment • A front-end access to the cluster server, that server is configured as a Chef Server also used to organize the executions of DFO TeraSort application is then characterized the synchronization functions of the DFO plays and updating parameter settings Hadoop based on each iteration of DFO TeraSort method.
  • 10. Experiment Execution • Nemesis a server that is not part of the cluster is used as a front end for the implementation TeraSort application, running the DFO method and updating settings Hadoop based on their output. • The synchronization of executions TeraSort updates and Hadoop with the output of DFO method is performed by dfo_hadoop_terasort application executed on the front-end server. • The implementation of dfo_hadoop_terasort application is supplied with a file that contains the initial values ​​of the configuration parameters of Hadoop, restrictions so that these values ​​do not reach unwanted data out value for the objective function, tolerance value for the restrictions and maximum amount of interactions. With the processing of the input file and the interaction with the Hadoop cluster is discovered which parameter values ​​cause a greater impact for faster execution of TeraSort application, taking as output a file with the best configuration parameters of that.
  • 11. Experiment Execution • As the cluster was composed of 28 servidoers slaves and each server with two processors, for a total of 56 slots available processing was decided to maintain 10% of this total, available for tasks due to failures in implementation were spaced more than once. Therefore, we used about 100 Gigabyte generated by Hadoop Teragen. • To confront the optimization of the execution time of Jobs, was executed two DFO BOBYCA And COBYLA method, aiming to identify which method best suits the application TeraSort forcenida by Hadoop .... • Two runs with both algorithms and 50 iterations to identify at what time the executions were carried out can converge to a better runtime.
  • 12. Switching COBYLA, BOBYQA • /* algoritmo COBYLA,Constrained Optimization BY Linear Approximations */ • opt = nlopt_create(NLOPT_LN_COBYLA, N); • //opt = nlopt_create(NLOPT_LN_BOBYQA, N); • nlopt_set_lower_bounds(opt, lb); • nlopt_set_upper_bounds(opt, ub); • nlopt_set_max_objective(opt, objetivo, NULL);
  • 14. Commands • First - Generate Tera sort • *Teragen will generate approximately 100 GB 100 000 179 688 bytes • $ hadoop jar $HADOOP_HOME/hadoop-*examples*.jar teragen <number of 100-byte rows> <output dir> • $ hadoop jar hadoop-examples-1.0.4.jar teragen 1000000000 terasort-input
  • 15. Commands • Second • [hadoop@nemesis otimizacao]$ nohup sudo time ./dfo_hadoop_terasort < entrada > log_execucao_terasort &
  • 16. Results • We used of DFO method with BOBYQA and COBYLA algorithms • Presented the main difference in variation of execution time of each iteration Jobs with dfo_hadoop_terasort application, • it is characterized mainly, how they treat approximations of the points for the object function, the quadratic or linear form respectively.
  • 17. TeraSort 50 a Iterações Tempo F 1 1491 2 1501 3 1447 4 1889 5 2076 6 1466 7 1470 8 1319 9 1897 10 1611 11 1440 12 1588 13 1321 14 1897 15 1289 16 1704 17 1294 18 1313 19 1728 20 1971 21 1842
  • 18. 1000 1200 1400 1600 1800 2000 2200 2400 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 timeinseconds number of iterations TeraSort 50 A using BOBYQA execution progress 1289
  • 19. TeraSort 50 c Iterações Tempo F 1 1587 2 1721 3 1473 4 1669 5 1801 6 1833 7 1486 8 1709 9 1510 10 1962 11 1988 12 1934 13 1898 14 2277 15 1933 16 1516 17 1601 18 1561 19 1639 20 1515 21 1507 22 2205 23 1838 24 2419 25 1744 26 1566 27 1619 28 1890 29 1988 30 1875 31 1620 32 1780 33 1607 34 1536 35 1621 36 1580 37 1626 38 1675 39 2065
  • 20. 1000 1200 1400 1600 1800 2000 2200 2400 2600 2800 3000 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 timeinseconds number of iterations TeraSort 50 C using BOBYQA execution progress 1473
  • 21. TeraSort 50 b Iterações Tempo F 1 1437 2 1343 3 1335 4 1228 5 1213 6 1240 7 1198 8 1203 9 1231 10 1178 11 1174 12 1187 13 1186 14 1204 15 1128 16 1150 17 1190 18 1165 19 1190 20 1208 21 1204 22 1113 23 1171 24 1185 25 1190 26 1170 27 1155 28 1211 29 1159 30 1198 31 1206 32 1144 33 1177 34 1179 35 1232 36 1157 37 1201 38 1150 39 1195 40 1178 41 1237 42 1196 43 1233 44 1356 45 1400 46 1674 47 1424 48 1365 49 1366 50 1320
  • 22. 1000 1100 1200 1300 1400 1500 1600 1700 1800 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 timeinseconds number of iterations TeraSort 50 B using COBYLA execution progress 1113
  • 23. TeraSort 50 new Iterações Tempo F 1 1442 2 1298 3 1285 4 1285 5 1274 6 1329 7 1343 8 1314 9 1289 10 1308 11 1304 12 1322 13 1345 14 1421 15 1369 16 1336 17 1348 18 1335 19 1333 20 1307 21 1367 22 1369 23 1352 24 1362 25 1390 26 1350 27 1324 28 1382 29 1347 30 1339
  • 24. 1000 1100 1200 1300 1400 1500 1600 1700 1800 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 timeinseconds number of iterations TeraSort 50 New using COBYLA execution progress 1274
  • 25. 1000 1100 1200 1300 1400 1500 1600 1700 1800 1 2 3 4 5 6 7 8 9 101112131415161718192021222324252627282930 number of iterations TeraSort 50 New using COBYLA execution progress 1000 1100 1200 1300 1400 1500 1600 1700 1800 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 number of iterations TeraSort 50 B using COBYLA execution progress 1000 1200 1400 1600 1800 2000 2200 2400 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 number of iterations TeraSort 50 A using BOBYQA execution progress 1000 1500 2000 2500 3000 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 number of iterations TeraSort 50 C using BOBYQA execution progress
  • 26. The use of DFO method with BOBYQA and COBYLA algorithms and presents as main difference the variation of execution time of each iteration Jobs dfo_hadoop_terasort application, it is mainly how they are treated approximations of the points for the object function the quadratic or linear form respectively. 1000 1200 1400 1600 1800 2000 2200 2400 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 timeinseconds Difference between Algorithms COBYLA and BOBYQA TeraSort 50 New/COBYLA TeraSort 50 B/COBYLA TeraSort 50 A/BOBYQA TeraSort 50 C/BOBYQA
  • 27. Conclusion • The convergence of the total time proves to be more stable in COBYLA and without many fluctuations when compared to BOBYQA algorithm. • The Speedup BOBYQA algorithm in the execution of TeraSort application is 12% on average • And the results reported by COBYLA algorithm, in the execution of TeraSort application demonstrates Speedup on average 21.15% over the initial settings of Hadoop and a greater optimization than the BOBYQA algorithm.
  • 28. References • [1] O'Malley, O. (2008, May). TeraByte Sort on Apache Hadoop. Retrieved from http://sortbenchmark.org/YahooHadoop.pdf • [2] Anand, A. (2009, May). Hadoop Sorts a Petabyte in 16:25 Hours and a Terabyte in 62 Seconds. Retrieved from https://developer.yahoo.com/blogs/hadoop/hadoop-sorts- petabyte-16-25-hours-terabyte-62-422.html • [3] Gray, J. (n.d.). Sort Benchmark Home Page. Retrieved from http://sortbenchmark.org/ • [4] A Measure ofTransaction Processing Power. (1985) Datamation, 31 (7), 112-118. • [5] Wikipedia; http://en.wikipedia.org/