SlideShare a Scribd company logo
1 of 31
Research on Scheduling Scheme
for Hadoop clusters
By: Jiong Xiea,b, FanJun Mengc, HaiLong Wangc, HongFang Panb, JinHong
Chengb, Xiao Qina
04/22/14 1CSC 8710
Outlines
• What is Hadoop?
• Hadoop Characterstics
• Hadoop Objectives
• Big Data Challenges
• Hadoop Architecture
• What is the predictive schedule and prefetching
mechanism ?
• Hadoop Issues
• Hadoop Scheduler
• PSP Scheduler
• Conclustion
04/22/14 2CSC 8710
Goal
• Designing prefetching mechanism to solve
the data moving problem in mapReducing
and to improve the performance.
04/22/14 CSC 8710 3
What is Hadoop?
• Hadoop is an open source software
framework that is used to deal with the
large amount of data and to process them
on clusters of commodity hardware.
04/22/14 4CSC 8710
Characteristics
• It is a framework of tools
- Not a particular program as some people think
• Open source tools.
• Distributed under apache license .
• Linux based tools.
• It works on a distributed models
- Not one big powerful computer, but numerous low
cost computers.
04/22/14 5CSC 8710
objectives
• Hadoop supports running of application on
Big Data.
• Therefore, Hadoop addresses Big Data
challenges.
Hadoop
Running application
on Big Datasupports
04/22/14 6CSC 8710
Big Data Challenges
04/22/14 7CSC 8710
Why Do We need Hadoop?
• Powerful computer can process data until some
point when the quantity of data becomes larger
than the ability of the computer.
• Now, we need Hadoop tool to deal with this
issue.
• Hadoop uses different strategy to deal with data.
04/22/14 8CSC 8710
Hadoop Functionality
• Hadoop breaks up the data into smaller pieces
and distribute them equally on different nodes to
be processed at the same time.
• Similarly, Hadoop divides the computation into
the nodes equally.
• Results are combined all together then sent
again to the application
04/22/14 9CSC 8710
Hadoop Functionality
Node Node
Big Data
Node
Combined
Result
Dividing the data equally
computation
Returning the result
Input data
Combining the result
04/22/14 10CSC 8710
Architecture
• Hadoop consists of two main components:
– MapReduce: divides the workload into smaller pieces
– File System (HDFS): accounts for component failure, and it
keeps directory for all the tasks
– There are other projects provide additional functionality:
• Pig
• Hive
• HBase
• Flume
• Mahout
• Oozie
• Scoop
MapReduce File System
HDFS
Hadoop
04/22/14 11CSC 8710
Architecture
• Slave computers consist of 2
components:
- Task Tracker: to process the given task, and it
represents the mapReduce component.
- Data Node: to manage the piece of task that has
been give to the task tracker, and it represents HDFS.
04/22/14 12CSC 8710
Architecture
• The master computer consists of 4
components:
- Job Tracker: It works under mapReduce component so it breaks up the
task into smaller pieces and divides them equally on the Task Trackers.
- Task Tracker: to process the given task.
- Name Node: It is responsible to keep an index of all the tasks.
- Data Node: to manage the piece of task that has been give to the
task tracker.
04/22/14 13CSC 8710
Architecture
04/22/14 14CSC 8710
Fault Tolerance for Data
• Hadoop keeps three copies of each file, and each copy is
given to a different node.
• If any one of the Task Tracker fails The Job Tracker will
detect that failure and will ask another Task Tracker to
take care of that job.
• Tables in The Name node will be backed up as well in
different computer, and this is the reason why the
enterprise version of Hadoop keeps two masters. One is
the working master and the other one is back up master.
04/22/14 15CSC 8710
Scalability cost
• The scalability cost is always linear. If you
want to increase the speed, increase the
number of computers.
04/22/14 16CSC 8710
predictive schedule and prefetching
• implementing a predictive schedule and
prefetching (PSP) mechanism on Hadoop tools
to improve the performance.
• Predictive scheduler:
- A flexible task scheduler, predicts the most appropriate task
trackers to the next data.
• Prefetching module:
– The responsible part of forcing the preload workers threads to
start loading data to main memory of the node before the
current task finish. It depends on estimated time.
04/22/14 17CSC 8710
PSP
• Factors that make PSP possible:
- Underutilization of CPU.
- Importance of MapReduce performance
- The storage availability in HDFS
- Interaction between the nodes
04/22/14 18CSC 8710
Hadoop’s Issue
• In the current MapReduce model, all the tasks are
managed by the master node, so the computation nodes
ask the master node to assign the new task to be
processed.
• The master node will tell the computing nodes what the
next task is, and where it is located.
• That will waste some of the CPU’s time while the
computation node communicates with the master node.
04/22/14 19CSC 8710
Hadoop’s Issue
• The original Hadoop assigns tasks randomly
from local or remote disk to the computation
node whenever the data is required.
• CPU of the computing nodes won’t process until
all the input data resources are loaded into the
main memory.
• This affects Hadoop’s performance negatively.
04/22/14 20CSC 8710
Prefetching
• It will force the preload workers threads to start
loading data from the local desk to the main
memory of the node before the current task
finish.
• The waiting time will be reduced, so the task will
be processed on time.
• Improving the performance of MapReduce
system.
04/22/14 21CSC 8710
Hadoop Scheduler
• The original Hadoop scheduler, The job tracker includes
the task scheduler module assign tasks to different tasks
trackers.
• Task Trackers periodically send heartbeat to the job
tracker.
• The job tracker checks the heartbeat and send tasks to
the available one.
• The scheduler assigns tasks randomly to the nodes via
the same heartbeat message protocol.
• It assigns tasks randomly and mispredict stragglers in
many cases.
04/22/14 22CSC 8710
Predictive Scheduler
• Making a predictive scheduler by designing a
prediction algorithm integrated with the original
Hadoop.
• The predictive scheduler predicts stragglers and
find the appropriate data blocks.
• The prediction decisions are made by a
prediction module during the prefetching stage.
04/22/14 23CSC 8710
Hadoop Function
04/22/14 24CSC 8710
Lunching Process
• Three basic steps to lunch the tasks:
- Copying the job from the shared file system to the job
tracker’s file system, and copying all the required
files.
- Creating a local directory of the task and un-jar the
content of the jar into the directory.
- Copying the task to the task tracker to be processed.
04/22/14 25CSC 8710
Lunching Process
• In PSP, all the last steps are monitored
by the prediction module, and it
predicts three events:
- The finish time of the current processed task.
- Tasks that are going to be assigned to the task
trackers
- Lunch time of the pending tasks.
04/22/14 26CSC 8710
prefetching
• These three issued must be addressed:
- When to prefetch:
- What to prefetch
- How much to prefetch
04/22/14 27CSC 8710
Conclusion
• Proposing a predictive scheduling and prefetching
mechanism (PSP) aim to enhance Hadoop performance.
• prediction module predicts data blocks to be accessed
by computing nodes in a cluster.
• the prefetching module preloads these future set of data
in the cache of the nodes.
• It has been applied on 10 nodes, so it reduces the
execution time up to 28% and 19% for the average.
• It increases the overall throughput and the I/O utilization.
04/22/14 28CSC 8710
Resources
• http://ac.els-cdn.com/S1877050913005668/1-s2.0-S1877050913005668-
main.pdf?_tid=00e2b8e8-8d59-11e3-be92-
00000aacb362&acdnat=1391490095_5f34abbe9f98d3b8a0978b2464478da
1
• http://blog.vitria.com/bid/87945/Big-Data-Analytics-Challenges-Facing-All-
Communications-Service-Providers
• http://blog.raremile.com/hadoop-demystified/
• http://namitkabra.wordpress.com/category/etl/page/2/
• http://odbms.org/download/Pro%20Hadoop%20Ch.%201.pdf
• http://hadoop.apache.org/docs/r0.18.0/hdfs_design.pdf
• http://wiki.apache.org/hadoop/Defining%20Hadoop
• https://engineering.purdue.edu/~ychu/ee673/Projects.F11/detectstraggeler_fi
nalrpt.pdf
04/22/14 29CSC 8710
04/22/14 30CSC 8710
04/22/14 31CSC 8710

More Related Content

What's hot

Get most out of Spark on YARN
Get most out of Spark on YARNGet most out of Spark on YARN
Get most out of Spark on YARNDataWorks Summit
 
The Impact of Columnar File Formats on SQL-on-Hadoop Engine Performance: A St...
The Impact of Columnar File Formats on SQL-on-Hadoop Engine Performance: A St...The Impact of Columnar File Formats on SQL-on-Hadoop Engine Performance: A St...
The Impact of Columnar File Formats on SQL-on-Hadoop Engine Performance: A St...t_ivanov
 
Hadoop 2 - Going beyond MapReduce
Hadoop 2 - Going beyond MapReduceHadoop 2 - Going beyond MapReduce
Hadoop 2 - Going beyond MapReduceUwe Printz
 
Hadoop Spark Introduction-20150130
Hadoop Spark Introduction-20150130Hadoop Spark Introduction-20150130
Hadoop Spark Introduction-20150130Xuan-Chao Huang
 
Spark-on-YARN: Empower Spark Applications on Hadoop Cluster
Spark-on-YARN: Empower Spark Applications on Hadoop ClusterSpark-on-YARN: Empower Spark Applications on Hadoop Cluster
Spark-on-YARN: Empower Spark Applications on Hadoop ClusterDataWorks Summit
 
Apache Hadoop MapReduce Tutorial
Apache Hadoop MapReduce TutorialApache Hadoop MapReduce Tutorial
Apache Hadoop MapReduce TutorialFarzad Nozarian
 
Hadoop hbase mapreduce
Hadoop hbase mapreduceHadoop hbase mapreduce
Hadoop hbase mapreduceFARUK BERKSÖZ
 
HadoopDB a major step towards a dead end
HadoopDB a major step towards a dead endHadoopDB a major step towards a dead end
HadoopDB a major step towards a dead endthkoch
 
An Introduction to Apache Spark
An Introduction to Apache SparkAn Introduction to Apache Spark
An Introduction to Apache SparkDona Mary Philip
 
Scaling Apache Spark at Facebook
Scaling Apache Spark at FacebookScaling Apache Spark at Facebook
Scaling Apache Spark at FacebookDatabricks
 
Introduction to Yarn
Introduction to YarnIntroduction to Yarn
Introduction to YarnApache Apex
 
Hive join optimizations
Hive join optimizationsHive join optimizations
Hive join optimizationsSzehon Ho
 
MapReduce presentation
MapReduce presentationMapReduce presentation
MapReduce presentationVu Thi Trang
 

What's hot (20)

Get most out of Spark on YARN
Get most out of Spark on YARNGet most out of Spark on YARN
Get most out of Spark on YARN
 
The Impact of Columnar File Formats on SQL-on-Hadoop Engine Performance: A St...
The Impact of Columnar File Formats on SQL-on-Hadoop Engine Performance: A St...The Impact of Columnar File Formats on SQL-on-Hadoop Engine Performance: A St...
The Impact of Columnar File Formats on SQL-on-Hadoop Engine Performance: A St...
 
Anatomy of Hadoop YARN
Anatomy of Hadoop YARNAnatomy of Hadoop YARN
Anatomy of Hadoop YARN
 
Hadoop 2 - Going beyond MapReduce
Hadoop 2 - Going beyond MapReduceHadoop 2 - Going beyond MapReduce
Hadoop 2 - Going beyond MapReduce
 
Hadoop fault-tolerance
Hadoop fault-toleranceHadoop fault-tolerance
Hadoop fault-tolerance
 
Hadoop YARN
Hadoop YARNHadoop YARN
Hadoop YARN
 
Hadoop Spark Introduction-20150130
Hadoop Spark Introduction-20150130Hadoop Spark Introduction-20150130
Hadoop Spark Introduction-20150130
 
Hadoop DB
Hadoop DBHadoop DB
Hadoop DB
 
Spark-on-YARN: Empower Spark Applications on Hadoop Cluster
Spark-on-YARN: Empower Spark Applications on Hadoop ClusterSpark-on-YARN: Empower Spark Applications on Hadoop Cluster
Spark-on-YARN: Empower Spark Applications on Hadoop Cluster
 
Hadoop and Spark
Hadoop and SparkHadoop and Spark
Hadoop and Spark
 
Apache Hadoop MapReduce Tutorial
Apache Hadoop MapReduce TutorialApache Hadoop MapReduce Tutorial
Apache Hadoop MapReduce Tutorial
 
Hadoop hbase mapreduce
Hadoop hbase mapreduceHadoop hbase mapreduce
Hadoop hbase mapreduce
 
HadoopDB a major step towards a dead end
HadoopDB a major step towards a dead endHadoopDB a major step towards a dead end
HadoopDB a major step towards a dead end
 
Rds data lake @ Robinhood
Rds data lake @ Robinhood Rds data lake @ Robinhood
Rds data lake @ Robinhood
 
HW09 Hadoop Vaidya
HW09 Hadoop VaidyaHW09 Hadoop Vaidya
HW09 Hadoop Vaidya
 
An Introduction to Apache Spark
An Introduction to Apache SparkAn Introduction to Apache Spark
An Introduction to Apache Spark
 
Scaling Apache Spark at Facebook
Scaling Apache Spark at FacebookScaling Apache Spark at Facebook
Scaling Apache Spark at Facebook
 
Introduction to Yarn
Introduction to YarnIntroduction to Yarn
Introduction to Yarn
 
Hive join optimizations
Hive join optimizationsHive join optimizations
Hive join optimizations
 
MapReduce presentation
MapReduce presentationMapReduce presentation
MapReduce presentation
 

Viewers also liked

Apache Hadoop India Summit 2011 talk "Scheduling in MapReduce using Machine L...
Apache Hadoop India Summit 2011 talk "Scheduling in MapReduce using Machine L...Apache Hadoop India Summit 2011 talk "Scheduling in MapReduce using Machine L...
Apache Hadoop India Summit 2011 talk "Scheduling in MapReduce using Machine L...Yahoo Developer Network
 
Tutorial on Deep learning and Applications
Tutorial on Deep learning and ApplicationsTutorial on Deep learning and Applications
Tutorial on Deep learning and ApplicationsNhatHai Phan
 
Hadoop and Machine Learning
Hadoop and Machine LearningHadoop and Machine Learning
Hadoop and Machine Learningjoshwills
 
Introduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningIntroduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningLars Marius Garshol
 
An Introduction to Supervised Machine Learning and Pattern Classification: Th...
An Introduction to Supervised Machine Learning and Pattern Classification: Th...An Introduction to Supervised Machine Learning and Pattern Classification: Th...
An Introduction to Supervised Machine Learning and Pattern Classification: Th...Sebastian Raschka
 

Viewers also liked (6)

Apache Hadoop India Summit 2011 talk "Scheduling in MapReduce using Machine L...
Apache Hadoop India Summit 2011 talk "Scheduling in MapReduce using Machine L...Apache Hadoop India Summit 2011 talk "Scheduling in MapReduce using Machine L...
Apache Hadoop India Summit 2011 talk "Scheduling in MapReduce using Machine L...
 
NoSQL databases
NoSQL databasesNoSQL databases
NoSQL databases
 
Tutorial on Deep learning and Applications
Tutorial on Deep learning and ApplicationsTutorial on Deep learning and Applications
Tutorial on Deep learning and Applications
 
Hadoop and Machine Learning
Hadoop and Machine LearningHadoop and Machine Learning
Hadoop and Machine Learning
 
Introduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningIntroduction to Big Data/Machine Learning
Introduction to Big Data/Machine Learning
 
An Introduction to Supervised Machine Learning and Pattern Classification: Th...
An Introduction to Supervised Machine Learning and Pattern Classification: Th...An Introduction to Supervised Machine Learning and Pattern Classification: Th...
An Introduction to Supervised Machine Learning and Pattern Classification: Th...
 

Similar to Suggested Algorithm to improve Hadoop's performance.

Juniper Innovation Contest
Juniper Innovation ContestJuniper Innovation Contest
Juniper Innovation ContestAMIT BORUDE
 
Scheduling scheme for hadoop clusters
Scheduling scheme for hadoop clustersScheduling scheme for hadoop clusters
Scheduling scheme for hadoop clustersAmjith Singh
 
Dache: A Data Aware Caching for Big-Data Applications Using the MapReduce Fra...
Dache: A Data Aware Caching for Big-Data Applications Usingthe MapReduce Fra...Dache: A Data Aware Caching for Big-Data Applications Usingthe MapReduce Fra...
Dache: A Data Aware Caching for Big-Data Applications Using the MapReduce Fra...Govt.Engineering college, Idukki
 
MOD-2 presentation on engineering students
MOD-2 presentation on engineering studentsMOD-2 presentation on engineering students
MOD-2 presentation on engineering studentsrishavkumar1402
 
Introduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduceIntroduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduceeakasit_dpu
 
project--2 nd review_2
project--2 nd review_2project--2 nd review_2
project--2 nd review_2Aswini Ashu
 
project--2 nd review_2
project--2 nd review_2project--2 nd review_2
project--2 nd review_2aswini pilli
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoopch adnan
 
Data Analytics and IoT, how to analyze data from IoT
Data Analytics and IoT, how to analyze data from IoTData Analytics and IoT, how to analyze data from IoT
Data Analytics and IoT, how to analyze data from IoTAmmarHassan80
 
Dache: A Data Aware Caching for Big-Data using Map Reduce framework
Dache: A Data Aware Caching for Big-Data using Map Reduce frameworkDache: A Data Aware Caching for Big-Data using Map Reduce framework
Dache: A Data Aware Caching for Big-Data using Map Reduce frameworkSafir Shah
 
Overview of Scientific Workflows - Why Use Them?
Overview of Scientific Workflows - Why Use Them?Overview of Scientific Workflows - Why Use Them?
Overview of Scientific Workflows - Why Use Them?inside-BigData.com
 
Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...
 Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov... Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...
Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...Databricks
 
Report Hadoop Map Reduce
Report Hadoop Map ReduceReport Hadoop Map Reduce
Report Hadoop Map ReduceUrvashi Kataria
 
HPCC Systems 6.0.0 Highlights
HPCC Systems 6.0.0 HighlightsHPCC Systems 6.0.0 Highlights
HPCC Systems 6.0.0 HighlightsHPCC Systems
 
Spark Summit EU talk by Luca Canali
Spark Summit EU talk by Luca CanaliSpark Summit EU talk by Luca Canali
Spark Summit EU talk by Luca CanaliSpark Summit
 
Hadoop Cluster on Docker Containers
Hadoop Cluster on Docker ContainersHadoop Cluster on Docker Containers
Hadoop Cluster on Docker Containerspranav_joshi
 
Infrastructure Around Hadoop
Infrastructure Around HadoopInfrastructure Around Hadoop
Infrastructure Around HadoopDataWorks Summit
 

Similar to Suggested Algorithm to improve Hadoop's performance. (20)

Juniper Innovation Contest
Juniper Innovation ContestJuniper Innovation Contest
Juniper Innovation Contest
 
Scheduling scheme for hadoop clusters
Scheduling scheme for hadoop clustersScheduling scheme for hadoop clusters
Scheduling scheme for hadoop clusters
 
Dache: A Data Aware Caching for Big-Data Applications Using the MapReduce Fra...
Dache: A Data Aware Caching for Big-Data Applications Usingthe MapReduce Fra...Dache: A Data Aware Caching for Big-Data Applications Usingthe MapReduce Fra...
Dache: A Data Aware Caching for Big-Data Applications Using the MapReduce Fra...
 
MOD-2 presentation on engineering students
MOD-2 presentation on engineering studentsMOD-2 presentation on engineering students
MOD-2 presentation on engineering students
 
Chapter 10
Chapter 10Chapter 10
Chapter 10
 
Introduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduceIntroduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduce
 
project--2 nd review_2
project--2 nd review_2project--2 nd review_2
project--2 nd review_2
 
project--2 nd review_2
project--2 nd review_2project--2 nd review_2
project--2 nd review_2
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
Data Analytics and IoT, how to analyze data from IoT
Data Analytics and IoT, how to analyze data from IoTData Analytics and IoT, how to analyze data from IoT
Data Analytics and IoT, how to analyze data from IoT
 
Dache: A Data Aware Caching for Big-Data using Map Reduce framework
Dache: A Data Aware Caching for Big-Data using Map Reduce frameworkDache: A Data Aware Caching for Big-Data using Map Reduce framework
Dache: A Data Aware Caching for Big-Data using Map Reduce framework
 
Overview of Scientific Workflows - Why Use Them?
Overview of Scientific Workflows - Why Use Them?Overview of Scientific Workflows - Why Use Them?
Overview of Scientific Workflows - Why Use Them?
 
Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...
 Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov... Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...
Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...
 
Report Hadoop Map Reduce
Report Hadoop Map ReduceReport Hadoop Map Reduce
Report Hadoop Map Reduce
 
HPCC Systems 6.0.0 Highlights
HPCC Systems 6.0.0 HighlightsHPCC Systems 6.0.0 Highlights
HPCC Systems 6.0.0 Highlights
 
Spark Summit EU talk by Luca Canali
Spark Summit EU talk by Luca CanaliSpark Summit EU talk by Luca Canali
Spark Summit EU talk by Luca Canali
 
Hadoop Cluster on Docker Containers
Hadoop Cluster on Docker ContainersHadoop Cluster on Docker Containers
Hadoop Cluster on Docker Containers
 
Infrastructure Around Hadoop
Infrastructure Around HadoopInfrastructure Around Hadoop
Infrastructure Around Hadoop
 
List of Engineering Colleges in Uttarakhand
List of Engineering Colleges in UttarakhandList of Engineering Colleges in Uttarakhand
List of Engineering Colleges in Uttarakhand
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
 

Recently uploaded

Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...amitlee9823
 
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...amitlee9823
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...amitlee9823
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNKTimothy Spann
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...amitlee9823
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...amitlee9823
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramMoniSankarHazra
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...only4webmaster01
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsJoseMangaJr1
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...amitlee9823
 

Recently uploaded (20)

Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
 
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 

Suggested Algorithm to improve Hadoop's performance.

  • 1. Research on Scheduling Scheme for Hadoop clusters By: Jiong Xiea,b, FanJun Mengc, HaiLong Wangc, HongFang Panb, JinHong Chengb, Xiao Qina 04/22/14 1CSC 8710
  • 2. Outlines • What is Hadoop? • Hadoop Characterstics • Hadoop Objectives • Big Data Challenges • Hadoop Architecture • What is the predictive schedule and prefetching mechanism ? • Hadoop Issues • Hadoop Scheduler • PSP Scheduler • Conclustion 04/22/14 2CSC 8710
  • 3. Goal • Designing prefetching mechanism to solve the data moving problem in mapReducing and to improve the performance. 04/22/14 CSC 8710 3
  • 4. What is Hadoop? • Hadoop is an open source software framework that is used to deal with the large amount of data and to process them on clusters of commodity hardware. 04/22/14 4CSC 8710
  • 5. Characteristics • It is a framework of tools - Not a particular program as some people think • Open source tools. • Distributed under apache license . • Linux based tools. • It works on a distributed models - Not one big powerful computer, but numerous low cost computers. 04/22/14 5CSC 8710
  • 6. objectives • Hadoop supports running of application on Big Data. • Therefore, Hadoop addresses Big Data challenges. Hadoop Running application on Big Datasupports 04/22/14 6CSC 8710
  • 8. Why Do We need Hadoop? • Powerful computer can process data until some point when the quantity of data becomes larger than the ability of the computer. • Now, we need Hadoop tool to deal with this issue. • Hadoop uses different strategy to deal with data. 04/22/14 8CSC 8710
  • 9. Hadoop Functionality • Hadoop breaks up the data into smaller pieces and distribute them equally on different nodes to be processed at the same time. • Similarly, Hadoop divides the computation into the nodes equally. • Results are combined all together then sent again to the application 04/22/14 9CSC 8710
  • 10. Hadoop Functionality Node Node Big Data Node Combined Result Dividing the data equally computation Returning the result Input data Combining the result 04/22/14 10CSC 8710
  • 11. Architecture • Hadoop consists of two main components: – MapReduce: divides the workload into smaller pieces – File System (HDFS): accounts for component failure, and it keeps directory for all the tasks – There are other projects provide additional functionality: • Pig • Hive • HBase • Flume • Mahout • Oozie • Scoop MapReduce File System HDFS Hadoop 04/22/14 11CSC 8710
  • 12. Architecture • Slave computers consist of 2 components: - Task Tracker: to process the given task, and it represents the mapReduce component. - Data Node: to manage the piece of task that has been give to the task tracker, and it represents HDFS. 04/22/14 12CSC 8710
  • 13. Architecture • The master computer consists of 4 components: - Job Tracker: It works under mapReduce component so it breaks up the task into smaller pieces and divides them equally on the Task Trackers. - Task Tracker: to process the given task. - Name Node: It is responsible to keep an index of all the tasks. - Data Node: to manage the piece of task that has been give to the task tracker. 04/22/14 13CSC 8710
  • 15. Fault Tolerance for Data • Hadoop keeps three copies of each file, and each copy is given to a different node. • If any one of the Task Tracker fails The Job Tracker will detect that failure and will ask another Task Tracker to take care of that job. • Tables in The Name node will be backed up as well in different computer, and this is the reason why the enterprise version of Hadoop keeps two masters. One is the working master and the other one is back up master. 04/22/14 15CSC 8710
  • 16. Scalability cost • The scalability cost is always linear. If you want to increase the speed, increase the number of computers. 04/22/14 16CSC 8710
  • 17. predictive schedule and prefetching • implementing a predictive schedule and prefetching (PSP) mechanism on Hadoop tools to improve the performance. • Predictive scheduler: - A flexible task scheduler, predicts the most appropriate task trackers to the next data. • Prefetching module: – The responsible part of forcing the preload workers threads to start loading data to main memory of the node before the current task finish. It depends on estimated time. 04/22/14 17CSC 8710
  • 18. PSP • Factors that make PSP possible: - Underutilization of CPU. - Importance of MapReduce performance - The storage availability in HDFS - Interaction between the nodes 04/22/14 18CSC 8710
  • 19. Hadoop’s Issue • In the current MapReduce model, all the tasks are managed by the master node, so the computation nodes ask the master node to assign the new task to be processed. • The master node will tell the computing nodes what the next task is, and where it is located. • That will waste some of the CPU’s time while the computation node communicates with the master node. 04/22/14 19CSC 8710
  • 20. Hadoop’s Issue • The original Hadoop assigns tasks randomly from local or remote disk to the computation node whenever the data is required. • CPU of the computing nodes won’t process until all the input data resources are loaded into the main memory. • This affects Hadoop’s performance negatively. 04/22/14 20CSC 8710
  • 21. Prefetching • It will force the preload workers threads to start loading data from the local desk to the main memory of the node before the current task finish. • The waiting time will be reduced, so the task will be processed on time. • Improving the performance of MapReduce system. 04/22/14 21CSC 8710
  • 22. Hadoop Scheduler • The original Hadoop scheduler, The job tracker includes the task scheduler module assign tasks to different tasks trackers. • Task Trackers periodically send heartbeat to the job tracker. • The job tracker checks the heartbeat and send tasks to the available one. • The scheduler assigns tasks randomly to the nodes via the same heartbeat message protocol. • It assigns tasks randomly and mispredict stragglers in many cases. 04/22/14 22CSC 8710
  • 23. Predictive Scheduler • Making a predictive scheduler by designing a prediction algorithm integrated with the original Hadoop. • The predictive scheduler predicts stragglers and find the appropriate data blocks. • The prediction decisions are made by a prediction module during the prefetching stage. 04/22/14 23CSC 8710
  • 25. Lunching Process • Three basic steps to lunch the tasks: - Copying the job from the shared file system to the job tracker’s file system, and copying all the required files. - Creating a local directory of the task and un-jar the content of the jar into the directory. - Copying the task to the task tracker to be processed. 04/22/14 25CSC 8710
  • 26. Lunching Process • In PSP, all the last steps are monitored by the prediction module, and it predicts three events: - The finish time of the current processed task. - Tasks that are going to be assigned to the task trackers - Lunch time of the pending tasks. 04/22/14 26CSC 8710
  • 27. prefetching • These three issued must be addressed: - When to prefetch: - What to prefetch - How much to prefetch 04/22/14 27CSC 8710
  • 28. Conclusion • Proposing a predictive scheduling and prefetching mechanism (PSP) aim to enhance Hadoop performance. • prediction module predicts data blocks to be accessed by computing nodes in a cluster. • the prefetching module preloads these future set of data in the cache of the nodes. • It has been applied on 10 nodes, so it reduces the execution time up to 28% and 19% for the average. • It increases the overall throughput and the I/O utilization. 04/22/14 28CSC 8710
  • 29. Resources • http://ac.els-cdn.com/S1877050913005668/1-s2.0-S1877050913005668- main.pdf?_tid=00e2b8e8-8d59-11e3-be92- 00000aacb362&acdnat=1391490095_5f34abbe9f98d3b8a0978b2464478da 1 • http://blog.vitria.com/bid/87945/Big-Data-Analytics-Challenges-Facing-All- Communications-Service-Providers • http://blog.raremile.com/hadoop-demystified/ • http://namitkabra.wordpress.com/category/etl/page/2/ • http://odbms.org/download/Pro%20Hadoop%20Ch.%201.pdf • http://hadoop.apache.org/docs/r0.18.0/hdfs_design.pdf • http://wiki.apache.org/hadoop/Defining%20Hadoop • https://engineering.purdue.edu/~ychu/ee673/Projects.F11/detectstraggeler_fi nalrpt.pdf 04/22/14 29CSC 8710