SlideShare a Scribd company logo
1 of 3
Download to read offline
Do Your Projects With Domain Experts…
Copyright © 2015 LeMeniz Infotech. All rights reserved
LeMeniz Infotech
36, 100 Feet Road, Natesan Nagar, Near Indira Gandhi Statue,
Pondicherry-605 005.
Call: 0413-4205444, +91 9566355386, 99625 88976.
Web : www.lemenizinfotech.com / www.ieeemaster.com
Mail : projects@lemenizinfotech.com
RFHOC: A Random-Forest Approach to Auto-Tuning Hadoop’s Configuration
ABSTRACT:
Hadoop is a widely-used implementation framework of the MapReduce
programming model for large-scale data processing. Hadoop performance however is
significantly affected by the settings of the Hadoop configuration parameters.
Unfortunately, manually tuning these parameters is very time-consuming, if at all practical.
This paper proposes an approach, called RFHOC, to automatically tune the Hadoop
configuration parameters for optimized performance for a given application running on a
given cluster. RFHOC constructs two ensembles of performance models using a random-
forest approach for the map and reduce stage respectively. Leveraging these models,
RFHOC employs a genetic algorithm to automatically search the Hadoop configuration
space. The evaluation of RFHOC using five typical Hadoop programs, each with five
different input data sets, shows that it achieves a performance speedup by a factor of
2.11on average and up to 7.4over the recently proposed cost-based optimization (CBO)
approach. In addition, RFHOC’s performance benefit increases with input data set size.
INTRODUCTION
MAPREDUCE is a widely used programming model for processing and generating vast
data sets on large-scale compute clusters. Hadoop is the most popular open-source
MapReduce framework, using which a broad set of applications have been developed,
including web indexing, machine learning, log file analysis, financial analysis [4] and
bioinformatics processing. A typical characteristic of these applications is that they run
repeatedly with different input data sets.
EXISTING SYSTEM
The Hadoop framework has up to 190 configuration parameters, and overall
performance is highly sensitive to the settings of these parameters. Because the Hadoop
configuration for optimum performance is applicationspecific, applying the default or a
single set of configuration settings optimized for a certain application to a wide range of
Do Your Projects With Domain Experts…
Copyright © 2015 LeMeniz Infotech. All rights reserved
LeMeniz Infotech
36, 100 Feet Road, Natesan Nagar, Near Indira Gandhi Statue,
Pondicherry-605 005.
Call: 0413-4205444, +91 9566355386, 99625 88976.
Web : www.lemenizinfotech.com / www.ieeemaster.com
Mail : projects@lemenizinfotech.com
applications leads to suboptimal
Performance
DisADVANTAGE OF Existing SYSTEM
 Application is extremely tedious and time-consuming, and may even cause serious
performance degradation.
PROPOSED SYSTEM
In Proposed System RFHOC, a novel methodology to optimize Hadoop
performance by leveraging the notion of a random forest to build accurate and robust
performance prediction models for the phases of the map and reduce stage of a Hadoop
program of interest.
ADVANTAGE OF PROPOSED SYSTEM
 Hadoop configuration setting that leads to optimized application performance. We
evaluate RFHOC using 5 Hadoop benchmarks, each with 5 input data sets ranging
from 50 GB to 1 TB. The results show that RFHOC speeds up Hadoop programs
 RFHOC’s performance benefits to increase with increasing input data set sizes.
ARCHITECTURE:
Do Your Projects With Domain Experts…
Copyright © 2015 LeMeniz Infotech. All rights reserved
LeMeniz Infotech
36, 100 Feet Road, Natesan Nagar, Near Indira Gandhi Statue,
Pondicherry-605 005.
Call: 0413-4205444, +91 9566355386, 99625 88976.
Web : www.lemenizinfotech.com / www.ieeemaster.com
Mail : projects@lemenizinfotech.com
HARDWARE REQUIREMENTS:
 System : Pentium IV 2.4 GHz.
 Hard Disk : 40 GB.
 Floppy Drive : 44 Mb.
 Monitor : 15 VGA Colour.
SOFTWARE REQUIREMENTS:
 Operating system : Windows 7.
 Coding Language : Java 1.7 ,Hadoop 0.8.1
 Database : MySql 5
 IDE : Eclipse

More Related Content

Viewers also liked

Lion opportunities plan BY AL AMIN
Lion opportunities plan BY  AL AMINLion opportunities plan BY  AL AMIN
Lion opportunities plan BY AL AMIN
Alamin Pathan
 
Sports, culture, and the media
Sports, culture, and the mediaSports, culture, and the media
Sports, culture, and the media
kellyjean11kk
 
My ppt game carlos suarez
My ppt game carlos suarezMy ppt game carlos suarez
My ppt game carlos suarez
carlos27748318
 

Viewers also liked (10)

Ten Modern Plagues - for Seder Discussion
Ten Modern Plagues - for Seder DiscussionTen Modern Plagues - for Seder Discussion
Ten Modern Plagues - for Seder Discussion
 
Bikok T. Pierre
Bikok T. PierreBikok T. Pierre
Bikok T. Pierre
 
Props
PropsProps
Props
 
Lion opportunities plan BY AL AMIN
Lion opportunities plan BY  AL AMINLion opportunities plan BY  AL AMIN
Lion opportunities plan BY AL AMIN
 
Sports, culture, and the media
Sports, culture, and the mediaSports, culture, and the media
Sports, culture, and the media
 
Trusted db a trusted hardware based database with privacy and data confidenti...
Trusted db a trusted hardware based database with privacy and data confidenti...Trusted db a trusted hardware based database with privacy and data confidenti...
Trusted db a trusted hardware based database with privacy and data confidenti...
 
The Rediscovery of Colour: Supplemental Material for Holonomics
The Rediscovery of Colour: Supplemental Material for HolonomicsThe Rediscovery of Colour: Supplemental Material for Holonomics
The Rediscovery of Colour: Supplemental Material for Holonomics
 
Dominating set and network coding based routing in wireless mesh netwoks
Dominating set and network coding based routing in wireless mesh netwoksDominating set and network coding based routing in wireless mesh netwoks
Dominating set and network coding based routing in wireless mesh netwoks
 
My ppt game carlos suarez
My ppt game carlos suarezMy ppt game carlos suarez
My ppt game carlos suarez
 
การประชุมวิชาการทางสัตวศาสตร์แห่งชาติครั้งที่ ๓
การประชุมวิชาการทางสัตวศาสตร์แห่งชาติครั้งที่ ๓การประชุมวิชาการทางสัตวศาสตร์แห่งชาติครั้งที่ ๓
การประชุมวิชาการทางสัตวศาสตร์แห่งชาติครั้งที่ ๓
 

Similar to Rfhoc a random forest approach to auto-tuning hadoop’s configuration

YARN: the Key to overcoming the challenges of broad-based Hadoop Adoption
YARN: the Key to overcoming the challenges of broad-based Hadoop AdoptionYARN: the Key to overcoming the challenges of broad-based Hadoop Adoption
YARN: the Key to overcoming the challenges of broad-based Hadoop Adoption
DataWorks Summit
 
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduceBIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
Mahantesh Angadi
 
Survey on Performance of Hadoop Map reduce Optimization Methods
Survey on Performance of Hadoop Map reduce Optimization MethodsSurvey on Performance of Hadoop Map reduce Optimization Methods
Survey on Performance of Hadoop Map reduce Optimization Methods
paperpublications3
 
Orca: A Modular Query Optimizer Architecture for Big Data
Orca: A Modular Query Optimizer Architecture for Big DataOrca: A Modular Query Optimizer Architecture for Big Data
Orca: A Modular Query Optimizer Architecture for Big Data
EMC
 

Similar to Rfhoc a random forest approach to auto-tuning hadoop’s configuration (20)

Hadoop performance modeling for job estimation and resource provisioning
Hadoop performance modeling for job estimation and resource provisioningHadoop performance modeling for job estimation and resource provisioning
Hadoop performance modeling for job estimation and resource provisioning
 
Rfhoc a random forest approach to auto-tuning hadoop's configuration
Rfhoc a random forest approach to auto-tuning hadoop's configurationRfhoc a random forest approach to auto-tuning hadoop's configuration
Rfhoc a random forest approach to auto-tuning hadoop's configuration
 
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce Framework
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce FrameworkBIGDATA- Survey on Scheduling Methods in Hadoop MapReduce Framework
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce Framework
 
Pivotal: Hadoop for Powerful Processing of Unstructured Data for Valuable Ins...
Pivotal: Hadoop for Powerful Processing of Unstructured Data for Valuable Ins...Pivotal: Hadoop for Powerful Processing of Unstructured Data for Valuable Ins...
Pivotal: Hadoop for Powerful Processing of Unstructured Data for Valuable Ins...
 
Dy scale a mapreduce job scheduler for heterogeneous multicore processors
Dy scale a mapreduce job scheduler for heterogeneous multicore processorsDy scale a mapreduce job scheduler for heterogeneous multicore processors
Dy scale a mapreduce job scheduler for heterogeneous multicore processors
 
YARN: the Key to overcoming the challenges of broad-based Hadoop Adoption
YARN: the Key to overcoming the challenges of broad-based Hadoop AdoptionYARN: the Key to overcoming the challenges of broad-based Hadoop Adoption
YARN: the Key to overcoming the challenges of broad-based Hadoop Adoption
 
Integration of SAP HANA with Hadoop
Integration of SAP HANA with HadoopIntegration of SAP HANA with Hadoop
Integration of SAP HANA with Hadoop
 
Hadoop
HadoopHadoop
Hadoop
 
IEEE 2014 JAVA DATA MINING PROJECTS Scalable keyword search on large rdf data
IEEE 2014 JAVA DATA MINING PROJECTS Scalable keyword search on large rdf dataIEEE 2014 JAVA DATA MINING PROJECTS Scalable keyword search on large rdf data
IEEE 2014 JAVA DATA MINING PROJECTS Scalable keyword search on large rdf data
 
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduceBIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
 
43_Sameer_Kumar_Das2
43_Sameer_Kumar_Das243_Sameer_Kumar_Das2
43_Sameer_Kumar_Das2
 
Recommendation engine
Recommendation engineRecommendation engine
Recommendation engine
 
Survey on Performance of Hadoop Map reduce Optimization Methods
Survey on Performance of Hadoop Map reduce Optimization MethodsSurvey on Performance of Hadoop Map reduce Optimization Methods
Survey on Performance of Hadoop Map reduce Optimization Methods
 
Cost effective resource provisioning for map reduce in a cloud
Cost effective resource provisioning for map reduce in a cloudCost effective resource provisioning for map reduce in a cloud
Cost effective resource provisioning for map reduce in a cloud
 
An effective classification approach for big data with parallel generalized H...
An effective classification approach for big data with parallel generalized H...An effective classification approach for big data with parallel generalized H...
An effective classification approach for big data with parallel generalized H...
 
Integrating dbm ss as a read only execution layer into hadoop
Integrating dbm ss as a read only execution layer into hadoopIntegrating dbm ss as a read only execution layer into hadoop
Integrating dbm ss as a read only execution layer into hadoop
 
Hadoop tutorial for Freshers,
Hadoop tutorial for Freshers, Hadoop tutorial for Freshers,
Hadoop tutorial for Freshers,
 
Novel Scheduling Algorithms for Efficient Deployment of Map Reduce Applicatio...
Novel Scheduling Algorithms for Efficient Deployment of Map Reduce Applicatio...Novel Scheduling Algorithms for Efficient Deployment of Map Reduce Applicatio...
Novel Scheduling Algorithms for Efficient Deployment of Map Reduce Applicatio...
 
Orca: A Modular Query Optimizer Architecture for Big Data
Orca: A Modular Query Optimizer Architecture for Big DataOrca: A Modular Query Optimizer Architecture for Big Data
Orca: A Modular Query Optimizer Architecture for Big Data
 
Fast raq a fast approach to range aggregate queries in big data environments
Fast raq a fast approach to range aggregate queries in big data environmentsFast raq a fast approach to range aggregate queries in big data environments
Fast raq a fast approach to range aggregate queries in big data environments
 

More from LeMeniz Infotech

Interleaved digital power factor correction based on the sliding mode approach
Interleaved digital power factor correction based on the sliding mode approachInterleaved digital power factor correction based on the sliding mode approach
Interleaved digital power factor correction based on the sliding mode approach
LeMeniz Infotech
 
Bumpless control for reduced thd in power factor correction circuits
Bumpless control for reduced thd in power factor correction circuitsBumpless control for reduced thd in power factor correction circuits
Bumpless control for reduced thd in power factor correction circuits
LeMeniz Infotech
 
A bidirectional three level llc resonant converter with pwam control
A bidirectional three level llc resonant converter with pwam controlA bidirectional three level llc resonant converter with pwam control
A bidirectional three level llc resonant converter with pwam control
LeMeniz Infotech
 
Efficient single phase transformerless inverter for grid tied pvg system with...
Efficient single phase transformerless inverter for grid tied pvg system with...Efficient single phase transformerless inverter for grid tied pvg system with...
Efficient single phase transformerless inverter for grid tied pvg system with...
LeMeniz Infotech
 
Highly reliable transformerless photovoltaic inverters with leakage current a...
Highly reliable transformerless photovoltaic inverters with leakage current a...Highly reliable transformerless photovoltaic inverters with leakage current a...
Highly reliable transformerless photovoltaic inverters with leakage current a...
LeMeniz Infotech
 
Grid current-feedback active damping for lcl resonance in grid-connected volt...
Grid current-feedback active damping for lcl resonance in grid-connected volt...Grid current-feedback active damping for lcl resonance in grid-connected volt...
Grid current-feedback active damping for lcl resonance in grid-connected volt...
LeMeniz Infotech
 
Delay dependent stability of single-loop controlled grid-connected inverters ...
Delay dependent stability of single-loop controlled grid-connected inverters ...Delay dependent stability of single-loop controlled grid-connected inverters ...
Delay dependent stability of single-loop controlled grid-connected inverters ...
LeMeniz Infotech
 

More from LeMeniz Infotech (20)

A fast acquisition all-digital delay-locked loop using a starting-bit predict...
A fast acquisition all-digital delay-locked loop using a starting-bit predict...A fast acquisition all-digital delay-locked loop using a starting-bit predict...
A fast acquisition all-digital delay-locked loop using a starting-bit predict...
 
A fast fault tolerant architecture for sauvola local image thresholding algor...
A fast fault tolerant architecture for sauvola local image thresholding algor...A fast fault tolerant architecture for sauvola local image thresholding algor...
A fast fault tolerant architecture for sauvola local image thresholding algor...
 
A dynamically reconfigurable multi asip architecture for multistandard and mu...
A dynamically reconfigurable multi asip architecture for multistandard and mu...A dynamically reconfigurable multi asip architecture for multistandard and mu...
A dynamically reconfigurable multi asip architecture for multistandard and mu...
 
Interleaved digital power factor correction based on the sliding mode approach
Interleaved digital power factor correction based on the sliding mode approachInterleaved digital power factor correction based on the sliding mode approach
Interleaved digital power factor correction based on the sliding mode approach
 
Bumpless control for reduced thd in power factor correction circuits
Bumpless control for reduced thd in power factor correction circuitsBumpless control for reduced thd in power factor correction circuits
Bumpless control for reduced thd in power factor correction circuits
 
A bidirectional single stage three phase rectifier with high-frequency isolat...
A bidirectional single stage three phase rectifier with high-frequency isolat...A bidirectional single stage three phase rectifier with high-frequency isolat...
A bidirectional single stage three phase rectifier with high-frequency isolat...
 
A bidirectional three level llc resonant converter with pwam control
A bidirectional three level llc resonant converter with pwam controlA bidirectional three level llc resonant converter with pwam control
A bidirectional three level llc resonant converter with pwam control
 
Efficient single phase transformerless inverter for grid tied pvg system with...
Efficient single phase transformerless inverter for grid tied pvg system with...Efficient single phase transformerless inverter for grid tied pvg system with...
Efficient single phase transformerless inverter for grid tied pvg system with...
 
Highly reliable transformerless photovoltaic inverters with leakage current a...
Highly reliable transformerless photovoltaic inverters with leakage current a...Highly reliable transformerless photovoltaic inverters with leakage current a...
Highly reliable transformerless photovoltaic inverters with leakage current a...
 
Grid current-feedback active damping for lcl resonance in grid-connected volt...
Grid current-feedback active damping for lcl resonance in grid-connected volt...Grid current-feedback active damping for lcl resonance in grid-connected volt...
Grid current-feedback active damping for lcl resonance in grid-connected volt...
 
Delay dependent stability of single-loop controlled grid-connected inverters ...
Delay dependent stability of single-loop controlled grid-connected inverters ...Delay dependent stability of single-loop controlled grid-connected inverters ...
Delay dependent stability of single-loop controlled grid-connected inverters ...
 
Connection of converters to a low and medium power dc network using an induct...
Connection of converters to a low and medium power dc network using an induct...Connection of converters to a low and medium power dc network using an induct...
Connection of converters to a low and medium power dc network using an induct...
 
Stamp enabling privacy preserving location proofs for mobile users
Stamp enabling privacy preserving location proofs for mobile usersStamp enabling privacy preserving location proofs for mobile users
Stamp enabling privacy preserving location proofs for mobile users
 
Sbvlc secure barcode based visible light communication for smartphones
Sbvlc secure barcode based visible light communication for smartphonesSbvlc secure barcode based visible light communication for smartphones
Sbvlc secure barcode based visible light communication for smartphones
 
Read2 me a cloud based reading aid for the visually impaired
Read2 me a cloud based reading aid for the visually impairedRead2 me a cloud based reading aid for the visually impaired
Read2 me a cloud based reading aid for the visually impaired
 
Privacy preserving location sharing services for social networks
Privacy preserving location sharing services for social networksPrivacy preserving location sharing services for social networks
Privacy preserving location sharing services for social networks
 
Pass byo bring your own picture for securing graphical passwords
Pass byo bring your own picture for securing graphical passwordsPass byo bring your own picture for securing graphical passwords
Pass byo bring your own picture for securing graphical passwords
 
Eplq efficient privacy preserving location-based query over outsourced encryp...
Eplq efficient privacy preserving location-based query over outsourced encryp...Eplq efficient privacy preserving location-based query over outsourced encryp...
Eplq efficient privacy preserving location-based query over outsourced encryp...
 
Analyzing ad library updates in android apps
Analyzing ad library updates in android appsAnalyzing ad library updates in android apps
Analyzing ad library updates in android apps
 
An exploration of geographic authentication scheme
An exploration of geographic authentication schemeAn exploration of geographic authentication scheme
An exploration of geographic authentication scheme
 

Recently uploaded

會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽
會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽
會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽
中 央社
 
SPLICE Working Group: Reusable Code Examples
SPLICE Working Group:Reusable Code ExamplesSPLICE Working Group:Reusable Code Examples
SPLICE Working Group: Reusable Code Examples
Peter Brusilovsky
 

Recently uploaded (20)

TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...
 
8 Tips for Effective Working Capital Management
8 Tips for Effective Working Capital Management8 Tips for Effective Working Capital Management
8 Tips for Effective Working Capital Management
 
When Quality Assurance Meets Innovation in Higher Education - Report launch w...
When Quality Assurance Meets Innovation in Higher Education - Report launch w...When Quality Assurance Meets Innovation in Higher Education - Report launch w...
When Quality Assurance Meets Innovation in Higher Education - Report launch w...
 
Mattingly "AI and Prompt Design: LLMs with NER"
Mattingly "AI and Prompt Design: LLMs with NER"Mattingly "AI and Prompt Design: LLMs with NER"
Mattingly "AI and Prompt Design: LLMs with NER"
 
The Story of Village Palampur Class 9 Free Study Material PDF
The Story of Village Palampur Class 9 Free Study Material PDFThe Story of Village Palampur Class 9 Free Study Material PDF
The Story of Village Palampur Class 9 Free Study Material PDF
 
會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽
會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽
會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽
 
MOOD STABLIZERS DRUGS.pptx
MOOD     STABLIZERS           DRUGS.pptxMOOD     STABLIZERS           DRUGS.pptx
MOOD STABLIZERS DRUGS.pptx
 
OS-operating systems- ch05 (CPU Scheduling) ...
OS-operating systems- ch05 (CPU Scheduling) ...OS-operating systems- ch05 (CPU Scheduling) ...
OS-operating systems- ch05 (CPU Scheduling) ...
 
How to Manage Website in Odoo 17 Studio App.pptx
How to Manage Website in Odoo 17 Studio App.pptxHow to Manage Website in Odoo 17 Studio App.pptx
How to Manage Website in Odoo 17 Studio App.pptx
 
DEMONSTRATION LESSON IN ENGLISH 4 MATATAG CURRICULUM
DEMONSTRATION LESSON IN ENGLISH 4 MATATAG CURRICULUMDEMONSTRATION LESSON IN ENGLISH 4 MATATAG CURRICULUM
DEMONSTRATION LESSON IN ENGLISH 4 MATATAG CURRICULUM
 
Spring gala 2024 photo slideshow - Celebrating School-Community Partnerships
Spring gala 2024 photo slideshow - Celebrating School-Community PartnershipsSpring gala 2024 photo slideshow - Celebrating School-Community Partnerships
Spring gala 2024 photo slideshow - Celebrating School-Community Partnerships
 
ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH FORM 50 CÂU TRẮC NGHI...
ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH FORM 50 CÂU TRẮC NGHI...ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH FORM 50 CÂU TRẮC NGHI...
ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH FORM 50 CÂU TRẮC NGHI...
 
ESSENTIAL of (CS/IT/IS) class 07 (Networks)
ESSENTIAL of (CS/IT/IS) class 07 (Networks)ESSENTIAL of (CS/IT/IS) class 07 (Networks)
ESSENTIAL of (CS/IT/IS) class 07 (Networks)
 
An Overview of the Odoo 17 Knowledge App
An Overview of the Odoo 17 Knowledge AppAn Overview of the Odoo 17 Knowledge App
An Overview of the Odoo 17 Knowledge App
 
SPLICE Working Group: Reusable Code Examples
SPLICE Working Group:Reusable Code ExamplesSPLICE Working Group:Reusable Code Examples
SPLICE Working Group: Reusable Code Examples
 
Book Review of Run For Your Life Powerpoint
Book Review of Run For Your Life PowerpointBook Review of Run For Your Life Powerpoint
Book Review of Run For Your Life Powerpoint
 
diagnosting testing bsc 2nd sem.pptx....
diagnosting testing bsc 2nd sem.pptx....diagnosting testing bsc 2nd sem.pptx....
diagnosting testing bsc 2nd sem.pptx....
 
Stl Algorithms in C++ jjjjjjjjjjjjjjjjjj
Stl Algorithms in C++ jjjjjjjjjjjjjjjjjjStl Algorithms in C++ jjjjjjjjjjjjjjjjjj
Stl Algorithms in C++ jjjjjjjjjjjjjjjjjj
 
male presentation...pdf.................
male presentation...pdf.................male presentation...pdf.................
male presentation...pdf.................
 
Analyzing and resolving a communication crisis in Dhaka textiles LTD.pptx
Analyzing and resolving a communication crisis in Dhaka textiles LTD.pptxAnalyzing and resolving a communication crisis in Dhaka textiles LTD.pptx
Analyzing and resolving a communication crisis in Dhaka textiles LTD.pptx
 

Rfhoc a random forest approach to auto-tuning hadoop’s configuration

  • 1. Do Your Projects With Domain Experts… Copyright © 2015 LeMeniz Infotech. All rights reserved LeMeniz Infotech 36, 100 Feet Road, Natesan Nagar, Near Indira Gandhi Statue, Pondicherry-605 005. Call: 0413-4205444, +91 9566355386, 99625 88976. Web : www.lemenizinfotech.com / www.ieeemaster.com Mail : projects@lemenizinfotech.com RFHOC: A Random-Forest Approach to Auto-Tuning Hadoop’s Configuration ABSTRACT: Hadoop is a widely-used implementation framework of the MapReduce programming model for large-scale data processing. Hadoop performance however is significantly affected by the settings of the Hadoop configuration parameters. Unfortunately, manually tuning these parameters is very time-consuming, if at all practical. This paper proposes an approach, called RFHOC, to automatically tune the Hadoop configuration parameters for optimized performance for a given application running on a given cluster. RFHOC constructs two ensembles of performance models using a random- forest approach for the map and reduce stage respectively. Leveraging these models, RFHOC employs a genetic algorithm to automatically search the Hadoop configuration space. The evaluation of RFHOC using five typical Hadoop programs, each with five different input data sets, shows that it achieves a performance speedup by a factor of 2.11on average and up to 7.4over the recently proposed cost-based optimization (CBO) approach. In addition, RFHOC’s performance benefit increases with input data set size. INTRODUCTION MAPREDUCE is a widely used programming model for processing and generating vast data sets on large-scale compute clusters. Hadoop is the most popular open-source MapReduce framework, using which a broad set of applications have been developed, including web indexing, machine learning, log file analysis, financial analysis [4] and bioinformatics processing. A typical characteristic of these applications is that they run repeatedly with different input data sets. EXISTING SYSTEM The Hadoop framework has up to 190 configuration parameters, and overall performance is highly sensitive to the settings of these parameters. Because the Hadoop configuration for optimum performance is applicationspecific, applying the default or a single set of configuration settings optimized for a certain application to a wide range of
  • 2. Do Your Projects With Domain Experts… Copyright © 2015 LeMeniz Infotech. All rights reserved LeMeniz Infotech 36, 100 Feet Road, Natesan Nagar, Near Indira Gandhi Statue, Pondicherry-605 005. Call: 0413-4205444, +91 9566355386, 99625 88976. Web : www.lemenizinfotech.com / www.ieeemaster.com Mail : projects@lemenizinfotech.com applications leads to suboptimal Performance DisADVANTAGE OF Existing SYSTEM  Application is extremely tedious and time-consuming, and may even cause serious performance degradation. PROPOSED SYSTEM In Proposed System RFHOC, a novel methodology to optimize Hadoop performance by leveraging the notion of a random forest to build accurate and robust performance prediction models for the phases of the map and reduce stage of a Hadoop program of interest. ADVANTAGE OF PROPOSED SYSTEM  Hadoop configuration setting that leads to optimized application performance. We evaluate RFHOC using 5 Hadoop benchmarks, each with 5 input data sets ranging from 50 GB to 1 TB. The results show that RFHOC speeds up Hadoop programs  RFHOC’s performance benefits to increase with increasing input data set sizes. ARCHITECTURE:
  • 3. Do Your Projects With Domain Experts… Copyright © 2015 LeMeniz Infotech. All rights reserved LeMeniz Infotech 36, 100 Feet Road, Natesan Nagar, Near Indira Gandhi Statue, Pondicherry-605 005. Call: 0413-4205444, +91 9566355386, 99625 88976. Web : www.lemenizinfotech.com / www.ieeemaster.com Mail : projects@lemenizinfotech.com HARDWARE REQUIREMENTS:  System : Pentium IV 2.4 GHz.  Hard Disk : 40 GB.  Floppy Drive : 44 Mb.  Monitor : 15 VGA Colour. SOFTWARE REQUIREMENTS:  Operating system : Windows 7.  Coding Language : Java 1.7 ,Hadoop 0.8.1  Database : MySql 5  IDE : Eclipse