SlideShare a Scribd company logo
1 of 1
Download to read offline
Hadoop Performance Modeling for Job Estimation and Resource
Provisioning
Abstract:
MapReduce has become a major computing model for data intensive
applications. Hadoop, an open source implementation of MapReduce, has been
adopted by an increasingly growing user community.Cloud computing service
providers such as Amazon EC2 Cloud offer the opportunities for Hadoop users to
lease a certain amount of resources and pay for their use. However, a key
challenge is thatcloud service providers do not have a resource provisioning
mechanism to satisfy user jobs with deadline requirements. Currently, it is solely
the user's responsibility to estimate the required amount of resources for running
a job in the cloud. This paper presents a Hadoop job performance model that
accurately estimates job completion time and further provisions the required
amount of resources for a job to be completed within a deadline. The proposed
model builds on historical job execution records and employs Locally Weighted
Linear Regression (LWLR) technique to estimate the execution time of a job.
Furthermore, it employs Lagrange Multipliers technique for resource provisioning
to satisfy jobs with deadline requirements. The proposed model is initially
evaluated on an in-house Hadoop cluster and subsequently evaluated in the
Amazon EC2 Cloud. Experimental results show that the accuracy of the proposed
model in job execution estimation is in the range of 94.97 and 95.51 percent, and
jobs are completed within the required deadlines following on the resource
provisioning scheme of the proposed model.

More Related Content

What's hot

What's hot (20)

Data scientist a perfect job
Data scientist a perfect jobData scientist a perfect job
Data scientist a perfect job
 
Hadoop Ecosystem Architecture Overview
Hadoop Ecosystem Architecture Overview Hadoop Ecosystem Architecture Overview
Hadoop Ecosystem Architecture Overview
 
MapR and Machine Learning Primer
MapR and Machine Learning PrimerMapR and Machine Learning Primer
MapR and Machine Learning Primer
 
State of the Art Robot Predictive Maintenance with Real-time Sensor Data
State of the Art Robot Predictive Maintenance with Real-time Sensor DataState of the Art Robot Predictive Maintenance with Real-time Sensor Data
State of the Art Robot Predictive Maintenance with Real-time Sensor Data
 
High Resolution Energy Modeling that Scales with Apache Spark 2.0 Spark Summi...
High Resolution Energy Modeling that Scales with Apache Spark 2.0 Spark Summi...High Resolution Energy Modeling that Scales with Apache Spark 2.0 Spark Summi...
High Resolution Energy Modeling that Scales with Apache Spark 2.0 Spark Summi...
 
Airfare prediction using Machine Learning with Apache Spark on 1 billion obse...
Airfare prediction using Machine Learning with Apache Spark on 1 billion obse...Airfare prediction using Machine Learning with Apache Spark on 1 billion obse...
Airfare prediction using Machine Learning with Apache Spark on 1 billion obse...
 
Hive - Cost Based Optimizer
Hive - Cost Based OptimizerHive - Cost Based Optimizer
Hive - Cost Based Optimizer
 
Multi objective tasks scheduling algorithm for cloud computing throughput opt...
Multi objective tasks scheduling algorithm for cloud computing throughput opt...Multi objective tasks scheduling algorithm for cloud computing throughput opt...
Multi objective tasks scheduling algorithm for cloud computing throughput opt...
 
Streaming Architecture to Connect Everything (Including Hybrid Cloud) - Strat...
Streaming Architecture to Connect Everything (Including Hybrid Cloud) - Strat...Streaming Architecture to Connect Everything (Including Hybrid Cloud) - Strat...
Streaming Architecture to Connect Everything (Including Hybrid Cloud) - Strat...
 
h2oensemble with Erin Ledell at useR! Aalborg
h2oensemble with Erin Ledell at useR! Aalborgh2oensemble with Erin Ledell at useR! Aalborg
h2oensemble with Erin Ledell at useR! Aalborg
 
Evolution of spark framework for simplifying data analysis.
Evolution of spark framework for simplifying data analysis.Evolution of spark framework for simplifying data analysis.
Evolution of spark framework for simplifying data analysis.
 
Serverless Data Architecture at scale on Google Cloud Platform
Serverless Data Architecture at scale on Google Cloud PlatformServerless Data Architecture at scale on Google Cloud Platform
Serverless Data Architecture at scale on Google Cloud Platform
 
Apache Spark Machine Learning Decision Trees
Apache Spark Machine Learning Decision TreesApache Spark Machine Learning Decision Trees
Apache Spark Machine Learning Decision Trees
 
Build 2017 - P4002 - Speedup Interactive Analytics on Petabytes of Data on Azure
Build 2017 - P4002 - Speedup Interactive Analytics on Petabytes of Data on AzureBuild 2017 - P4002 - Speedup Interactive Analytics on Petabytes of Data on Azure
Build 2017 - P4002 - Speedup Interactive Analytics on Petabytes of Data on Azure
 
Predicting Flight Delays with Spark Machine Learning
Predicting Flight Delays with Spark Machine LearningPredicting Flight Delays with Spark Machine Learning
Predicting Flight Delays with Spark Machine Learning
 
Greenplum-Spark November 2018
Greenplum-Spark November 2018Greenplum-Spark November 2018
Greenplum-Spark November 2018
 
[Giovanni Galloro] How to use machine learning on Google Cloud Platform
[Giovanni Galloro] How to use machine learning on Google Cloud Platform[Giovanni Galloro] How to use machine learning on Google Cloud Platform
[Giovanni Galloro] How to use machine learning on Google Cloud Platform
 
CBO-2
CBO-2CBO-2
CBO-2
 
Enterprise Data Lakes
Enterprise Data LakesEnterprise Data Lakes
Enterprise Data Lakes
 
Apache kylin 2.0: from classic olap to real-time data warehouse
Apache kylin 2.0: from classic olap to real-time data warehouseApache kylin 2.0: from classic olap to real-time data warehouse
Apache kylin 2.0: from classic olap to real-time data warehouse
 

Similar to Hadoop performance modeling for job estimation and resource provisioning

Hadoop performance modeling for job
Hadoop performance modeling for jobHadoop performance modeling for job
Hadoop performance modeling for job
ranjith kumar
 
REVIEW 2 PDC 20BCE1577.pptx
REVIEW 2 PDC 20BCE1577.pptxREVIEW 2 PDC 20BCE1577.pptx
REVIEW 2 PDC 20BCE1577.pptx
praful91
 

Similar to Hadoop performance modeling for job estimation and resource provisioning (20)

Hadoop performance modeling for job
Hadoop performance modeling for jobHadoop performance modeling for job
Hadoop performance modeling for job
 
Hadoop performance modeling for job estimation and resource provisioning
Hadoop performance modeling for job estimation and resource provisioningHadoop performance modeling for job estimation and resource provisioning
Hadoop performance modeling for job estimation and resource provisioning
 
Scheduling in cloud computing
Scheduling in cloud computingScheduling in cloud computing
Scheduling in cloud computing
 
A hadoop map reduce
A hadoop map reduceA hadoop map reduce
A hadoop map reduce
 
REVIEW 2 PDC 20BCE1577.pptx
REVIEW 2 PDC 20BCE1577.pptxREVIEW 2 PDC 20BCE1577.pptx
REVIEW 2 PDC 20BCE1577.pptx
 
Ieeepro techno solutions ieee java project - budget-driven scheduling algor...
Ieeepro techno solutions   ieee java project - budget-driven scheduling algor...Ieeepro techno solutions   ieee java project - budget-driven scheduling algor...
Ieeepro techno solutions ieee java project - budget-driven scheduling algor...
 
Ieeepro techno solutions ieee dotnet project - budget-driven scheduling alg...
Ieeepro techno solutions   ieee dotnet project - budget-driven scheduling alg...Ieeepro techno solutions   ieee dotnet project - budget-driven scheduling alg...
Ieeepro techno solutions ieee dotnet project - budget-driven scheduling alg...
 
Deadline-aware MapReduce Job Scheduling with Dynamic Resource Availability
Deadline-aware MapReduce Job Scheduling with Dynamic Resource AvailabilityDeadline-aware MapReduce Job Scheduling with Dynamic Resource Availability
Deadline-aware MapReduce Job Scheduling with Dynamic Resource Availability
 
Job scheduling in hybrid cloud using deep reinforcement learning for cost opt...
Job scheduling in hybrid cloud using deep reinforcement learning for cost opt...Job scheduling in hybrid cloud using deep reinforcement learning for cost opt...
Job scheduling in hybrid cloud using deep reinforcement learning for cost opt...
 
B04 06 0918
B04 06 0918B04 06 0918
B04 06 0918
 
Cloud batch a batch job queuing system on clouds with hadoop and h-base
Cloud batch  a batch job queuing system on clouds with hadoop and h-baseCloud batch  a batch job queuing system on clouds with hadoop and h-base
Cloud batch a batch job queuing system on clouds with hadoop and h-base
 
Design architecture based on web
Design architecture based on webDesign architecture based on web
Design architecture based on web
 
DESIGN ARCHITECTURE-BASED ON WEB SERVER AND APPLICATION CLUSTER IN CLOUD ENVI...
DESIGN ARCHITECTURE-BASED ON WEB SERVER AND APPLICATION CLUSTER IN CLOUD ENVI...DESIGN ARCHITECTURE-BASED ON WEB SERVER AND APPLICATION CLUSTER IN CLOUD ENVI...
DESIGN ARCHITECTURE-BASED ON WEB SERVER AND APPLICATION CLUSTER IN CLOUD ENVI...
 
B04 06 0918
B04 06 0918B04 06 0918
B04 06 0918
 
Research Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and ScienceResearch Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and Science
 
Characterization of hadoop jobs using unsupervised learning
Characterization of hadoop jobs using unsupervised learningCharacterization of hadoop jobs using unsupervised learning
Characterization of hadoop jobs using unsupervised learning
 
Ieeepro techno solutions 2014 ieee java project - deadline based resource p...
Ieeepro techno solutions   2014 ieee java project - deadline based resource p...Ieeepro techno solutions   2014 ieee java project - deadline based resource p...
Ieeepro techno solutions 2014 ieee java project - deadline based resource p...
 
Ieeepro techno solutions 2014 ieee dotnet project - deadline based resource...
Ieeepro techno solutions   2014 ieee dotnet project - deadline based resource...Ieeepro techno solutions   2014 ieee dotnet project - deadline based resource...
Ieeepro techno solutions 2014 ieee dotnet project - deadline based resource...
 
Ieeepro techno solutions 2014 ieee dotnet project - deadline based resource...
Ieeepro techno solutions   2014 ieee dotnet project - deadline based resource...Ieeepro techno solutions   2014 ieee dotnet project - deadline based resource...
Ieeepro techno solutions 2014 ieee dotnet project - deadline based resource...
 
Map-Reduce Synchronized and Comparative Queue Capacity Scheduler in Hadoop fo...
Map-Reduce Synchronized and Comparative Queue Capacity Scheduler in Hadoop fo...Map-Reduce Synchronized and Comparative Queue Capacity Scheduler in Hadoop fo...
Map-Reduce Synchronized and Comparative Queue Capacity Scheduler in Hadoop fo...
 

More from ieeepondy

More from ieeepondy (20)

Demand aware network function placement
Demand aware network function placementDemand aware network function placement
Demand aware network function placement
 
Service description in the nfv revolution trends, challenges and a way forward
Service description in the nfv revolution trends, challenges and a way forwardService description in the nfv revolution trends, challenges and a way forward
Service description in the nfv revolution trends, challenges and a way forward
 
Secure optimization computation outsourcing in cloud computing a case study o...
Secure optimization computation outsourcing in cloud computing a case study o...Secure optimization computation outsourcing in cloud computing a case study o...
Secure optimization computation outsourcing in cloud computing a case study o...
 
Spatial related traffic sign inspection for inventory purposes using mobile l...
Spatial related traffic sign inspection for inventory purposes using mobile l...Spatial related traffic sign inspection for inventory purposes using mobile l...
Spatial related traffic sign inspection for inventory purposes using mobile l...
 
Standards for hybrid clouds
Standards for hybrid cloudsStandards for hybrid clouds
Standards for hybrid clouds
 
Rfhoc a random forest approach to auto-tuning hadoop's configuration
Rfhoc a random forest approach to auto-tuning hadoop's configurationRfhoc a random forest approach to auto-tuning hadoop's configuration
Rfhoc a random forest approach to auto-tuning hadoop's configuration
 
Resource and instance hour minimization for deadline constrained dag applicat...
Resource and instance hour minimization for deadline constrained dag applicat...Resource and instance hour minimization for deadline constrained dag applicat...
Resource and instance hour minimization for deadline constrained dag applicat...
 
Reliable and confidential cloud storage with efficient data forwarding functi...
Reliable and confidential cloud storage with efficient data forwarding functi...Reliable and confidential cloud storage with efficient data forwarding functi...
Reliable and confidential cloud storage with efficient data forwarding functi...
 
Rebuttal to “comments on ‘control cloud data access privilege and anonymity w...
Rebuttal to “comments on ‘control cloud data access privilege and anonymity w...Rebuttal to “comments on ‘control cloud data access privilege and anonymity w...
Rebuttal to “comments on ‘control cloud data access privilege and anonymity w...
 
Scalable cloud–sensor architecture for the internet of things
Scalable cloud–sensor architecture for the internet of thingsScalable cloud–sensor architecture for the internet of things
Scalable cloud–sensor architecture for the internet of things
 
Scalable algorithms for nearest neighbor joins on big trajectory data
Scalable algorithms for nearest neighbor joins on big trajectory dataScalable algorithms for nearest neighbor joins on big trajectory data
Scalable algorithms for nearest neighbor joins on big trajectory data
 
Robust workload and energy management for sustainable data centers
Robust workload and energy management for sustainable data centersRobust workload and energy management for sustainable data centers
Robust workload and energy management for sustainable data centers
 
Privacy preserving deep computation model on cloud for big data feature learning
Privacy preserving deep computation model on cloud for big data feature learningPrivacy preserving deep computation model on cloud for big data feature learning
Privacy preserving deep computation model on cloud for big data feature learning
 
Pricing the cloud ieee projects, ieee projects chennai, ieee projects 2016,ie...
Pricing the cloud ieee projects, ieee projects chennai, ieee projects 2016,ie...Pricing the cloud ieee projects, ieee projects chennai, ieee projects 2016,ie...
Pricing the cloud ieee projects, ieee projects chennai, ieee projects 2016,ie...
 
Protection of big data privacy
Protection of big data privacyProtection of big data privacy
Protection of big data privacy
 
Power optimization with bler constraint for wireless fronthauls in c ran
Power optimization with bler constraint for wireless fronthauls in c ranPower optimization with bler constraint for wireless fronthauls in c ran
Power optimization with bler constraint for wireless fronthauls in c ran
 
Performance aware cloud resource allocation via fitness-enabled auction
Performance aware cloud resource allocation via fitness-enabled auctionPerformance aware cloud resource allocation via fitness-enabled auction
Performance aware cloud resource allocation via fitness-enabled auction
 
Performance limitations of a text search application running in cloud instances
Performance limitations of a text search application running in cloud instancesPerformance limitations of a text search application running in cloud instances
Performance limitations of a text search application running in cloud instances
 
Performance analysis and optimal cooperative cluster size for randomly distri...
Performance analysis and optimal cooperative cluster size for randomly distri...Performance analysis and optimal cooperative cluster size for randomly distri...
Performance analysis and optimal cooperative cluster size for randomly distri...
 
Predictive control for energy aware consolidation in cloud datacenters
Predictive control for energy aware consolidation in cloud datacentersPredictive control for energy aware consolidation in cloud datacenters
Predictive control for energy aware consolidation in cloud datacenters
 

Recently uploaded

Personalisation of Education by AI and Big Data - Lourdes Guàrdia
Personalisation of Education by AI and Big Data - Lourdes GuàrdiaPersonalisation of Education by AI and Big Data - Lourdes Guàrdia
Personalisation of Education by AI and Big Data - Lourdes Guàrdia
EADTU
 
SURVEY I created for uni project research
SURVEY I created for uni project researchSURVEY I created for uni project research
SURVEY I created for uni project research
CaitlinCummins3
 
Spellings Wk 4 and Wk 5 for Grade 4 at CAPS
Spellings Wk 4 and Wk 5 for Grade 4 at CAPSSpellings Wk 4 and Wk 5 for Grade 4 at CAPS
Spellings Wk 4 and Wk 5 for Grade 4 at CAPS
AnaAcapella
 

Recently uploaded (20)

Including Mental Health Support in Project Delivery, 14 May.pdf
Including Mental Health Support in Project Delivery, 14 May.pdfIncluding Mental Health Support in Project Delivery, 14 May.pdf
Including Mental Health Support in Project Delivery, 14 May.pdf
 
Sternal Fractures & Dislocations - EMGuidewire Radiology Reading Room
Sternal Fractures & Dislocations - EMGuidewire Radiology Reading RoomSternal Fractures & Dislocations - EMGuidewire Radiology Reading Room
Sternal Fractures & Dislocations - EMGuidewire Radiology Reading Room
 
AIM of Education-Teachers Training-2024.ppt
AIM of Education-Teachers Training-2024.pptAIM of Education-Teachers Training-2024.ppt
AIM of Education-Teachers Training-2024.ppt
 
Personalisation of Education by AI and Big Data - Lourdes Guàrdia
Personalisation of Education by AI and Big Data - Lourdes GuàrdiaPersonalisation of Education by AI and Big Data - Lourdes Guàrdia
Personalisation of Education by AI and Big Data - Lourdes Guàrdia
 
Book Review of Run For Your Life Powerpoint
Book Review of Run For Your Life PowerpointBook Review of Run For Your Life Powerpoint
Book Review of Run For Your Life Powerpoint
 
VAMOS CUIDAR DO NOSSO PLANETA! .
VAMOS CUIDAR DO NOSSO PLANETA!                    .VAMOS CUIDAR DO NOSSO PLANETA!                    .
VAMOS CUIDAR DO NOSSO PLANETA! .
 
How to Manage Website in Odoo 17 Studio App.pptx
How to Manage Website in Odoo 17 Studio App.pptxHow to Manage Website in Odoo 17 Studio App.pptx
How to Manage Website in Odoo 17 Studio App.pptx
 
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...
 
How to Send Pro Forma Invoice to Your Customers in Odoo 17
How to Send Pro Forma Invoice to Your Customers in Odoo 17How to Send Pro Forma Invoice to Your Customers in Odoo 17
How to Send Pro Forma Invoice to Your Customers in Odoo 17
 
OSCM Unit 2_Operations Processes & Systems
OSCM Unit 2_Operations Processes & SystemsOSCM Unit 2_Operations Processes & Systems
OSCM Unit 2_Operations Processes & Systems
 
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...
 
SURVEY I created for uni project research
SURVEY I created for uni project researchSURVEY I created for uni project research
SURVEY I created for uni project research
 
When Quality Assurance Meets Innovation in Higher Education - Report launch w...
When Quality Assurance Meets Innovation in Higher Education - Report launch w...When Quality Assurance Meets Innovation in Higher Education - Report launch w...
When Quality Assurance Meets Innovation in Higher Education - Report launch w...
 
24 ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH SỞ GIÁO DỤC HẢI DƯ...
24 ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH SỞ GIÁO DỤC HẢI DƯ...24 ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH SỞ GIÁO DỤC HẢI DƯ...
24 ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH SỞ GIÁO DỤC HẢI DƯ...
 
DEMONSTRATION LESSON IN ENGLISH 4 MATATAG CURRICULUM
DEMONSTRATION LESSON IN ENGLISH 4 MATATAG CURRICULUMDEMONSTRATION LESSON IN ENGLISH 4 MATATAG CURRICULUM
DEMONSTRATION LESSON IN ENGLISH 4 MATATAG CURRICULUM
 
ESSENTIAL of (CS/IT/IS) class 07 (Networks)
ESSENTIAL of (CS/IT/IS) class 07 (Networks)ESSENTIAL of (CS/IT/IS) class 07 (Networks)
ESSENTIAL of (CS/IT/IS) class 07 (Networks)
 
MOOD STABLIZERS DRUGS.pptx
MOOD     STABLIZERS           DRUGS.pptxMOOD     STABLIZERS           DRUGS.pptx
MOOD STABLIZERS DRUGS.pptx
 
Spellings Wk 4 and Wk 5 for Grade 4 at CAPS
Spellings Wk 4 and Wk 5 for Grade 4 at CAPSSpellings Wk 4 and Wk 5 for Grade 4 at CAPS
Spellings Wk 4 and Wk 5 for Grade 4 at CAPS
 
FICTIONAL SALESMAN/SALESMAN SNSW 2024.pdf
FICTIONAL SALESMAN/SALESMAN SNSW 2024.pdfFICTIONAL SALESMAN/SALESMAN SNSW 2024.pdf
FICTIONAL SALESMAN/SALESMAN SNSW 2024.pdf
 
Observing-Correct-Grammar-in-Making-Definitions.pptx
Observing-Correct-Grammar-in-Making-Definitions.pptxObserving-Correct-Grammar-in-Making-Definitions.pptx
Observing-Correct-Grammar-in-Making-Definitions.pptx
 

Hadoop performance modeling for job estimation and resource provisioning

  • 1. Hadoop Performance Modeling for Job Estimation and Resource Provisioning Abstract: MapReduce has become a major computing model for data intensive applications. Hadoop, an open source implementation of MapReduce, has been adopted by an increasingly growing user community.Cloud computing service providers such as Amazon EC2 Cloud offer the opportunities for Hadoop users to lease a certain amount of resources and pay for their use. However, a key challenge is thatcloud service providers do not have a resource provisioning mechanism to satisfy user jobs with deadline requirements. Currently, it is solely the user's responsibility to estimate the required amount of resources for running a job in the cloud. This paper presents a Hadoop job performance model that accurately estimates job completion time and further provisions the required amount of resources for a job to be completed within a deadline. The proposed model builds on historical job execution records and employs Locally Weighted Linear Regression (LWLR) technique to estimate the execution time of a job. Furthermore, it employs Lagrange Multipliers technique for resource provisioning to satisfy jobs with deadline requirements. The proposed model is initially evaluated on an in-house Hadoop cluster and subsequently evaluated in the Amazon EC2 Cloud. Experimental results show that the accuracy of the proposed model in job execution estimation is in the range of 94.97 and 95.51 percent, and jobs are completed within the required deadlines following on the resource provisioning scheme of the proposed model.